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08/598,591 filed on February 12, 1996. 

CODING SEQUENCES OF THE HUMAN BRCA1 GENE 

5 

FIELD OF THE INVENTION 

This invention relates to a gene which has been associated with breast and 
ovarian cancer where the gene is found to be mutated. More specifically, this 
invention relates to the three coding sequences of the BRCA1 gene BRCAl(° mi1 )' 
10 BRCAK omi2 >, and BRCAl( omi3 )) isolated from human subjects. 

j3 BACKGROUND OF THE INVENTION 

I J It has been estimated that about 5-10% of breast cancer is inherited Rowell, 

; m S v et aL, American Journal of Human Genetics 55:861-865 (1994). Located on 
IS chromosome 17, BRCA1 is the first gene identified to be conferring increased risk 
for breast and ovarian cancer. Miki et aL, ScienceJ266:66-7l (1994). Mutations in 
J;j this "tumor suppressor" gene are thought to account for roughly 45% of 
f * inherited breast cancer and 80-90% of families with increased risk of early onset 
O breast and ovarian cancer. Easton et aL, American Journal of Human Genetics 
2d 52:678-701 (1993). 

Locating one or more mutations in the BRCA1 region of chromosome 17 
provides a promising approach to reducing the high incidence and mortality 
associated with breast and ovarian cancer through the early detection of women 
at high risk. These women, once identified, can be targeted for more aggressive 
25 prevention programs. Screening is carried out by a variety of methods which 
include karyotyping, probe binding and DNA sequencing. 

In DNA sequencing technology, genomic DNA is extracted from whole 
blood and the coding sequences of the BRCA1 gene are amplified. The coding 
sequences might be sequenced completely and the results are compared to the 
30 DNA sequence of the gene. Alternatively, the coding sequence of the sample 
gene may be compared to a panel of known mutations before completely 
sequencing the gene and comparing it to a normal sequence of the gene. 
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If a mutation in the BRCA1 coding sequence is found, it may be possible to 
provide the individual with increased expression of the gene through gene 
transfer therapy. It has been demonstrated that the gene transfer of the BRCA1 
coding sequence into cancer cells inhibits their growth and reduces 
5 tumorigenesis of human cancer cells in nude mice. Jeffrey Holt and his 
colleagues conclude that the product of BRCA1 expression is a secreted tumor 
growth inhibitor, making BRCA1 an ideal gene for gene therapy studies. 
Transduction of only a moderate percentage of tumor cells apparently produces 
enough growth inhibitor to inhibit all tumor cells. Arteaga, CL, and JT Holt 
10 Cancer Research 56: 1098-1103 (1996), Holt, JT et al, Nature Genetics 12: 298-302 
3 (1996). 

j The observation of Holt et al, that the BRCA1 growth inhibitor is a 

I secreted protein leads to the possible use of injection of the growth inhibitor into 
1 the area of the tumor for turner suppression. 

U5 The BRCA1 gene is divided into 24 separate exons. Exons 1 and 4 are 

noncoding, in that they are not part of the final functional BRCA1 protein 
product. The BRCA1 coding sequence spans roughly 5600 base pairs (bp). Each 
exon consists of 200-400 bp, except for exon 11 which contains about 3600 bp. To 
I sequence the coding sequence of the BRCA1 gene, each exon is amplified 
20 separately and the resulting PCR products are sequenced in the forward and 
reverse directions. Because exon 11 is so large, we have divided it into twelve 
overlapping PCR fragments of roughly 350 bp each (segments "A" through "L" of 
BRCA1 exon 11). 

Many mutations and polymorphisms have already been reported in the 
25 BRCA1 gene. A world wide web site has been built to facilitate the detection and 
characterization of alterations in breast cancer susceptibility genes. Such 
mutations in BRCA1 can be accessed through the Breast Cancer Information Core 
at: http://www.nchgr.nih.gov/dir/lab_transfer/bic. This data site became 
publicly available on November 1, 1995. Friend, S. et al Nature Genetics 11:238, 
30 (1995). 
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The genetics of Breast/Ovarian Cancer Syndrome is autosomal dominant 
with reduced penetrance. In simple terms, this means that the syndrome runs 
through families such that both sexes can be carriers (only women get the disease 
but men can pass it on), all generations will likely have breast/ ovarian or both 
5 diseases and sometimes in the same individual, occasionally women carriers 
either die young before they have the time to manifest disease (and yet offspring 
get it) or they never develop breast or ovarian cancer and die of old age (the latter 
people are said to have "reduced penetrance" because they never develop cancer). 
Pedigree analysis and genetic counseling is absolutely essential to the proper 
10 workup of a family prior to any lab work 
p Until now, only a single coding sequence for the BRCA1 gene has been 

J*J available for comparison to patient samples. That sequence is available as 
I J GenBank Accession Number U14680. There is a need in the art, therefore, to 
In have available a coding sequence which is the BRCA1 coding sequence found in 
j;15 the majority of the population, a "consensus coding sequence", BRCAl( omil ) Seq. 

ID. NO. 1. A consensus coding sequence will make it possible for true mutations 
m to be easily identified or differentiated from polymorphisms. Identification of 
H mutations of the BRCA1 gene and protein would allow more widespread 
O diagnostic screening for hereditary breast and ovarian cancer than is currently 
"20 possible. Two additional coding sequences have been isolated and characterize. 
The BRCAl(° mi2 ) SEQ. ID. NO.: 3, and BRCAl(° mi3 ) SEQ. ID. NO.:5 coding 
sequences also have utility in diagnosis, gene therapy and in making therapeutic 
BRCA1 protein. 

A coding sequence of the BRCA1 gene which occurs most commonly in 
25 the human gene pool is provided. The most commonly occurring coding 
sequence more accurately reflects the most likely sequence to be found in a 
subject. Use of the coding sequence BRCAK omil ) SEQ. ID. NO.: 1, rather than the 
previously published BRCA1 sequence, will reduce the likelihood of 
misinterpreting a "sequence variation" found in the population (i.e. 
30 polymorphism) with a pathologic "mutation" (i.e. causes disease in the 
individual or puts the individual at a high risk of developing the disease). With 
large interest in breast cancer predisposition testing, misinterpretation is 
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particularly worrisome. People who already have breast cancer are asking the 
clinical question: "is my disease caused by a heritable genetic mutation?" The 
relatives of the those with breast cancer are asking the question: "Am I also a 
carrier of the mutation my relative has? Thus, is my risk increased, and should I 
5 undergo a more aggressive surveillance program/' 

SUMMARY OF THE INVENTION 

The present invention is based on the isolation of three coding sequences 
of the BRCA1 gene found in human individuals. 
10 It is an object of the invention to provide the most commonly occurring 

3 coding sequence of the BRCA1 gene. 

j It is another object of this invention to provide two other coding 

sequences of BRCA1 gene. 
^ It is another object of the invention to provide three protein sequences 

y 15 coded for by three of the coding sequences of the BRCA1 gene. 

It is another object of the invention to provide a list of the codon pairs 

which occur at each of seven polymorphic points on the BRCA1 gene. 
L j It is another object of the invention to provide the rates of occurrence for 

!f the codons. 

20 It is another object of the invention to provide a method wherein BRCA1, 

or parts thereof, is amplified with one or more oligonucleotide primers. 

It is another object of this invention to provide a method of identifying 
individuals who carry no mutation(s) of the BRCA1 coding sequence and 
therefore have no increased genetic susceptibility to breast or ovarian cancer 
25 based on their BRCA1 genes. 

It is another object of this invention to provide a method of identifying a 
mutation leading to an increased genetic susceptibility to breast or ovarian 
cancer. 

There is a need in the art for a sequence of the BRCA1 gene and for the 
30 protein sequence of BRCA1 as well as for an accurate list of codons which occur at 
polymorphic points on a sequence. 
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A person skilled in the art of genetic susceptibility testing will find the present 
invention" useful for: 

a) identifying individuals having a BRCA1 gene with no coding 
mutations, who therefore cannot be said to have an increased 

5 genetic susceptibility to breast or ovarian cancer from their BRCA1 

genes; 

b) avoiding misinterpretation of polymorphisms found in the BRCA1 
gene; 

c) determining the presence of a previously unknown mutation in the 
10 BRCA1 gene. 

d) identifying a mutation which increases the genetic susceptibility t o 
breast or ovarian cancer. 

e) probing a human sample of the BRCA1 gene. 

f) performing gene therapy. 

j I 5 g) for making a functioning tumor growth inhibitor protein coded for 

by one of the BRCAl omi genes. 

BRIEF DESCRIPTION OF THE FIGURE 

As shown in FIGURE 1, the alternative alleles at polymorphic (non-mutation 
20 causing variations) sites along a chromosome can be represented as a "haplotype" 
within a gene such as BRCA1. The BRCAK omil ) haplotype is shown in Figure 1 
with dark shading (encompassing the alternative alleles found at nucleotide sites 
2201, 2430, 2731, 3232, 3667, 4427, and 4956). For comparison, the haplotype that is 
in GenBank is shown with no shading. As can be seen from the figure, the 
25 common "consensus" haplotype is found intact in five separate chromosomes 
labeled with the OMI symbol (numbers 1-5 from left to right). Two additional 
haplotypes (BRCAK^), a nd BRCAK*™*) are represented with mixed dark and 
light shading (numbers 7 and 9 from left to right). In total, 7 of 10 haplotypes 
along the BRCA1 gene are unique. 

30 
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DETAILED DESCRIPTION OF THE INVENTION 



DEFINITIONS 

The following definitions are provided for the purpose of understanding this 
5 invention. 

"Breast and Ovarian cancer" is understood by those skilled in the art to 
include breast and ovarian cancer in women and also breast and prostate cancer 
in men. BRCA1 is associated genetic susceptibility to inherited breast and 
10 ovarian cancer in women and also breast and prostate cancer in men. Therefore, 
3 claims in this document which recite breast and /or ovarian cancer refer to breast, 
ovarian and prostate cancers in men and women. 

i " Coding sequence" or " DNA coding sequence"refers to those portions of a 

]15 gene which, taken together, code for a peptide (protein), or which nucleic acid 
itself has function. 

j " Protein" or "peptide" refers to a sequence amino acids which has 

:J function. 

20 "BRCAK omi )" refers collectively to the "BRCAK omil >", "BRCAK omi2 )" and 

"BRCAK omi3 )" coding sequences. 

"BRCAl( omi1 )" refers to SEQ. ID. NO.: 1, a coding sequence for the BRCA1 
gene. The coding sequence was found by end to end sequencing of BRCA1 alleles 
25 from individuals randomly drawn from a Caucasian population found to have 
no family history of breast or ovarian cancer. The sequenced gene was found not 
to contain any mutations. BRCAl(° mi1 ) was determined to be a consensus 
sequence by calculating the frequency with which the coding sequence occurred 
among the sample alleles sequenced. 

30 

"BRCAK omi 2)" and "BRCAK°™3)" refer to SEQ. ID. NO.: 3, and SEQ. ID. NO.: 
5 respectively. They are two additional coding sequences for the BRCA1 gene 
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which were also isolated from individuals randomly drawn from a Caucasian 
population found to have no family history of breast or ovarian cancer, 
polymorphisms 

5 "Primer" as used herein refers to a sequence comprising about 20 or more 

nucleotides of the BRCA1 gene. 

"Genetic susceptibility" refers to the susceptibility to breast or ovarian 
cancer due to the presence of a mutation in the BRCA1 gene. 

10 

A "target polynucleotide" refers to the nucleic acid sequence of interest e.g., 
the BRCA1 encoding polynucleotide. Other primers which can be used for 
primer hybridization will be known or readily ascertainable to those of skill in 
the art. 

15 

"Consensus" means the most commonly occurring in the population. 

"Consensus genomic sequence" means the allele of the target gene which 
occurs with the greatest frequency in a population of individuals having no 
family history of disease associated with the target gene. 

20 

"Substantially complementary to" refers to a probe or primer sequences 
which hybridize to the sequences provided under stringent conditions and/or 
sequences having sufficient homology with BRCA1 sequences, such that the 
allele specific oligonucleotide probe or primers hybridize to the BRCA1 sequences 
25 to which they are complimentary. 

"Haplotype" refers to a series of alleles within a gene on a chromosome. 

"Isolated" as used herein refers to substantially free of other nucleic acids, 
30 proteins, lipids, carbohydrates or other materials with which they may be 
associated. Such association is typically either in cellular material or in a 
synthesis medium. 
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"Mutation" refers to a base change or a gain or loss of base pair(s) in a DNA 
sequence, which results in a DNA sequence which codes for a non-functioning 
protein or a protein with substantially reduced or altered function. 

5 

"Polymorphism" refers to a base change which is not associated with 
known pathology. 

"Tumor growth inhibitor protein" refers to the protein coded for by the 
10 BRCA1 gene. The functional protein is thought to suppress breast and ovarian 
tumor growth. 

^ The invention in several of its embodiments includes: 

1. An isolated consensus DNA sequence of the BRCA1 coding sequence as set 
I forth in SEQ. ID. NO.: 1. 

;is 

2. A consensus protein sequence of the BRCA1 protein as set forth in 
= SEQ. ID. NO.: 2. 

y 3. An isolated coding sequence of the BRCA1 gene as set forth in 
JO SEQ. ID. NO.: 3. 

4. A protein sequence of the BRCA1 protein as set forth in 
SEQ. ID. NO.: 4 . 

25 5. An isolated coding sequence of the BRCA1 gene as set forth in 
SEQ. ID. NO.: 5. 

6. A protein sequence of the BRCA1 protein as set forth in SEQ. ID. NO.: 6. 

30 

7. A BRCA1 gene with a BRCA1 coding sequence not associated with 
breast or ovarian cancer which comprises an. alternative pair of codons, AGC 
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and AGT, which occur at position 2201 at frequencies of about 35-45%, and 
from about 55-65%, respectively. 

8. A BRCA1 gene according to Claim 7 wherein AGC occurs at a 
5 frequency of about 40%. 

9. A set of at least two alternative codon pairs which occur at 
polymorphic positions in a BRCA1 gene with a BRCA1 coding sequence not 
associated with breast or ovarian cancer, wherein codon pairs are selected 

10 from the group consisting of: 

• AGC and AGT at position 2201 ; 

• TTG and CTG at position 2430; 

• CCG and CTG at position 273 1 ; 

• GAA and GGA at position 3232; 
15 • AAA and AGA at position 3667; 

• TCT and TCC at position 4427; and 

• AGT and GGT at position 4956. 

10. A set of at least two alternative codon pairs according to claim 9, 
20 wherein the codon pairs occur in the following frequencies, respectively, in a 

population of individuals free of disease: 

• at position 2201, AGC and AGT occur at frequencies from about 
35-45%. and from about 55-65%, respectively; 

at position 2430, TTG and CTG occur at frequencies from about 
25 35-45%, and from about 55-65%, respectively; 

• at position 2731, CCG and CTG occur at frequencies from about 
25-35%, and from about 65-75%, respectively; 

• at position 3232, GAA and GGA occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

30 • at position 3667, AAA and AGA occur at frequencies from about 

35-45%, and from about 55-65%, respectively; 

• at position 4427, TCT and TCC occur at frequencies from about 
45-55%, and from about 45-55%, respectively; and 

at position 4956, AGT and GGT occur at frequencies from about 
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35-45%, and from about 55-65%, respectively. 



11 A set according to Claim 10 which is at least three codon pairs. 



5 12 A set according to Claim 10 which is at least four codon pairs. 



13. A set according to Claim 10 which is at least five codon pairs. 
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14. A set according to Claim 10 which is at least six codon pairs. 



15 A set according to Claim 10 which is at least seven codon pairs. 



16. A method of identifying individuals having a BRCA1 gene with a 
BRCA1 coding sequence not associated with disease, comprising: 
15 (a) amplifying a DNA fragment of an individual's BRCA1 

coding sequence using an oligonucleotide primer which 
specifically hybridizes to sequences within the gene; 
(b) sequencing said amplified DNA fragment by dideoxy 
sequencing; 

20 (c) repeating steps (a) and (b) until said individual's BRCA1 

coding sequence is completely sequenced; 
(d) comparing the sequence of said amplified DNA fragment 
to a BRCAl< omi > DNA sequence, SEQ. ID. NOl, SEQ. ID. 
N03, or SEQ. ID. NOS; 
25 (e) determining the presence or absence of each of the 

following polymorphic variation in said individual's 
BRCA1 coding sequence: 

AGC and AGT at position 2201, 
TTG and CTG at position 2430, 
30 • CCG and CTG at position 273 1 , 

GAA and GGA at position 3232, 
AAA and AGA at position 3667 F 
TCT and TCC at position 4427, and 
AGT and GGT at position 4956; 
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(f) determining any sequence differences between said 
individual's BRCA1 coding sequences and SEQ. ID. NOl, 
SEQ. ID, N03, or SEQ. ID. NOS wherein the presence of 
said polymorphic variations and the absence of a 
variation outside of positions 2201, 2430, 2731, 3232, 
3667, 4427, and 4956, is correlated with an absence of 
increased genetic susceptibility to breast or ovarian 
cancer resulting from a BRCA1 mutation in the BRCA1 
coding sequence. 

17. A method of claim 16 wherein, codon variations occur at the 
following frequencies, respectively, in a population of individuals free of 
disease: 

• at position 2201, AGC and AGT occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

• at position 2430, TTG and CTG occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

• at position 2731, CCG and CTG occur at frequencies from about 
25-35%, and from about 65-75%, respectively; 

• at position 3232, GAA and GGA occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

• at position 3667, AAA and AGA occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

at position 4427, TCT and TCC occur at frequencies from about 
45-55%, and from about 45-55%, respectively; and 

• at position 4956, AGT and GGT occur at frequencies from about 
35-45%, and from about 55-65%, respectively. 

18. A method according to claim 16 wherein said oligonucleotide primer is 
labeled with a radiolabel, a fluorescent label a bioluminescent label, a 
chemiluminescent label, or an enzyme label. 

19. A method of detecting a increased genetic susceptibility to breast and 
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20 



25 



ovarian cancer in an individual resulting from the presence of a mutation in 
the BRCA1 coding sequence, comprising: 

(a) amplifying a DNA fragment of an individual's BRCA1 
coding sequence using an oligonucleotide primer which 
specifically hybridizes to sequences within the gene; 

(b) sequencing said amplified DNA fragment by dideoxy 
sequencing; 

(c) repeating steps (a) and (b) until said individual's BRCA1 
coding sequence is completely sequenced; 

(d) comparing the sequence of said amplified DNA fragment 
to a BRCA1 (° mi ) DNA sequence, SEQ. ID. NOl, SEQ. ID. 
N03, or SEQ. ID. N05; 

(e) determining any sequence differences between said 
individual's BRCA1 coding sequences and SEQ. ID. NOl, 
SEQ. ID. N03, or SEQ. ID. N05; to determine the 
presence or absence of base changes in said individual's 
BRCA1 coding sequence wherein a base change which is 
not any one of the following: 

AGC and AGT at position 2201, 
TTG and CTG at position 2430, 
CCG and CTG at position 2731, 
GAA and GGA at position 3232, 
AAA and AGA at position 3667, 
TCT and TCC at position 4427, and 
AGT and GGT at position 4956 is correlated with 
the potential of increased genetic susceptibility to 
breast or ovarian cancer resulting from a BRCA1 
mutation in the BRCA1 coding sequence. 
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20. A method of claim 19 wherein, codon variations occur at the following 
frequencies, respectively, in a population free of disease: 

• at position 2201, AGC and AGT occur at frequencies from about 
40%, and from about 55-65%, respectively; 

at position 2430, TTG and CTG occur at frequencies from about 
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35-45%, and from about 55-65%, respectively; 

• • at position 2731, CCG and CTG occur at frequencies from about 

25-35%, and from about 65-75%, respectively; 

• at position 3232, GAA and GGA occur at frequencies from about 
5 35-45%, and from about 55-65%, respectively; 

• at position 3667, AAA and AGA occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

• at position 4427, TCT and TCC occur at frequencies from about 
45-55%, and from about 45-55%, respectively; and 

10 • at position 4956, AGT and GGT occur at frequencies from about 

35-45%, and from about 55-65%, respectively. 

2L A method according to claim 19 wherein said oligonucleotide primer is 
labeled with a radiolabel, a fluorescent label a bioluminescent label, a 
15 chemiluminescent label, or an enzyme labeL 

22. A set of codon pairs, which occur at polymorphic positions in a 
BRCA1 gene with a BRCA1 coding sequence according to Claim 1, wherein 
said set of codon pairs is: 

20 • AGC and AGT at position 220 1 ; 

TTG and CTG at position 2430; 
CCG and CTG at position 2731; 

• GAA and GGA at position 3232; 

• AAA and AGA at position 3667; 

25 • TCT and TCC at position 4427; and 

• AGT and GGT at position 4956. 

23. A set of at least two alternative codon pairs according to claim 22 
wherein set of at least two alternative codon pairs occur at the following 

30 frequencies: 

• at position 2201, AGC and AGT occur at frequencies of about 
40%, and from about 55-65%, respectively; 

• at position 2430, TTG and CTG occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 
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at position 2731, CCG and CTG occur at frequencies from about 

25-35%, and from about 65-75%, respectively; 

at position 3232, GAA and GGA occur at frequencies from about 

35-45%, and from about 55-65%, respectively; 

at position 3667, AAA and AGA occur at frequencies from about 

35-45%, and from about 55-65%, respectively; 

at position 4427, TCT and TCC occur at frequencies from about 

45-55%, and from about 45-55%, respectively; and 

at position 4956, AGT and GGT occur at frequencies from about 

35-45%, and from about 55-65%, respectively. 



24. A BRCA1 coding sequence according to claim 1 wherein the codon 
pairs occur at the following frequencies: 

at position 2201, AGC and AGT occur at frequencies of about 
15 40%, and from about 55-65%, respectively; 

at position 2430, TTG and CTG occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 
at position 2731, CCG and CTG occur at frequencies from about 
25-35%, and from about 65-75%, respectively; 
20 • at position 3232, GAA and GGA occur at frequencies from about 

35-45%, and from about 55-65%, respectively; 
at position 3667, AAA and AGA occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 
at position 4427, TCT and TCC occur at frequencies from about 
25 45-55%, and from about 45-55%, respectively; and 

at position 4956, AGT and GGT occur at frequencies from about 
35-45%, and from about 55-65%, respectively. 



25. A method of determining the consensus genomic sequence or consensus 
30 coding sequence for a target gene, comprising: 

a) screening a number of individuals in a population for a family history 
which indicates inheritance of normal alleles for a target gene; 

b) isolating at least one allele of the target gene from individuals found to 
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have a family history which indicates inheritance of normal alleles for a target 
gene; 

c) sequencing each allele; 

d) comparing the nucleic acid sequence of the genomic sequence or of the 
5 coding sequence of each allele of the target gene to determine similarities and 

differences in the nucleic acid sequence; and 

e) determining which allele of the target gene occurs with the greatest 
frequency. 



10 26. A method of performing gene therapy, comprising: 

a) transfecting cancer cell in vivo with an effective amount of a 
vector transformed with a BRCA1 coding sequences of SEQ. 
ID. NO.: 1, SEQ. ID. NO.: 3, or SEQ. ID. NO.: 5; 

b) allowing the cells to take up the vector, and 
15 c) measuring a reduction in tumor growth. 

27. A method of performing protein therapy, comprising: 

a) injecting into a patient, an effective amount of BRCA1 tumor 
growth inhibiting protein of SEQ. ID. NO.: 2, SEQ. ID. NO.: 

20 4, or SEQ. ID. NO.: 6; 

b) allowing the cells to take up the protein, and 

c) measuring a reduction in tumor growth. 



SEQUENCING 

25 Any nucleic acid specimen, in purified or non-purified form, can be 

utilized as the starting nucleic acid or acids, providing it contains, or is suspected 
of containing, the specific nucleic acid sequence containing a polymorphic locus. 
Thus, the process may amplify, for example, DNA or RNA, including messenger 
RNA, wherein DNA or RNA may be single stranded or double stranded. In the 

30 event that RNA is to be used as a template, enzymes, and /or conditions optimal 
for reverse transcribing the template to DNA would be utilized. In addition, a 
DNA-RNA hybrid which contains one strand of each may be utilized. A mixture 
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of nucleic acids may also be employed, or the nucleic acids produced in a 
previous amplification reaction herein, using the same or different primers may 
be so utilized. See TABLE II. The specific nucleic acid sequence to be amplified, 
i.e., the polymorphic locus, may be a fraction of a larger molecule or can be 
present initially as a discrete molecule, so that the specific sequence constitutes 
the entire nucleic acid. It is not necessary that the sequence to be amplified be 
present initially in a pure form; it may be a minor fraction of a complex mixture, 
such as contained in whole human DNA. 

DNA utilized herein may be extracted from a body sample, such as blood, 
tissue material and the like by a variety of techniques such as that described by 
Maniatis, et. al in Molecular Cloning:A Laboratory Manual, Cold Spring Harbor, 
NY, p 280-281, 1982). If the extracted sample is impure, it may be treated before 
amplification with an amount of a reagent effective to open the cells, or animal 
cell membranes of the sample, and to expose and/or separate the strand(s) of the 
nucleic acid(s). This lysing and nucleic acid denaturing step to expose and 
separate the strands will allow amplification to occur much more readily. 

The deoxyribonucleotide triphosphates dATP, dCTP, dGTP, and dTTP are 
added to the synthesis mixture, either separately or together with the primers, in 
adequate amounts and the resulting solution is heated to about 90°-100°C from 
about 1 to 10 minutes, preferably from 1 to 4 minutes. After this heating period, 
the solution is allowed to cool, which is preferable for the primer hybridization. 
To the cooled mixture is added an appropriate agent for effecting the primer 
extension reaction (called herein "agent for polymerization"), and the reaction is 
allowed to occur under conditions known in the art. The agent for 
polymerization may also be added together with the other reagents if it is heat 
stable. This synthesis (or amplification) reaction may occur at room temperature 
up to a temperature above which the agent for polymerization no longer 
functions. Thus, for example, if DNA polymerase is used as the agent, the 
temperature is generally no greater than about 40°C. Most conveniently the 
reaction occurs at room temperature. 

The primers used to carry out this invention embrace oligonucleotides of 
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sufficient length and appropriate sequence to provide initiation of 
polymerization. Environmental conditions conducive to synthesis include the 
presence of nucleoside triphosphates and an agent for polymerization, such as 
DNA polymerase, and a suitable temperature and pH. The primer is preferably 
5 single stranded for maximum efficiency in amplification, but may be double 
stranded. If double stranded, the primer is first treated to separate its strands 
before being used to prepare extension products. The primer must be sufficiently 
long to prime the synthesis of extension products in the presence of the inducing 
agent for polymerization. The exact length of primer will depend on many 

10 factors, including temperature, buffer, and nucleotide composition. The 
oligonucleotide primer typically contains 12-20 or more nucleotides, although it 
may contain fewer nucleotides. 

Primers used to carry out this invention are designed to be substantially 
complementary to each strand of the genomic locus to be amplified. This means 

15 that the primers must be sufficiently complementary to hybridize with their 
respective strands under conditions which allow the agent for polymerization to 
perform. In other words, the primers should have sufficient complementarity 
with the 5' and 3' sequences flanking the mutation to hybridize therewith and 
permit amplification of the genomic locus. 

20 Oligonucleotide primers of the invention are employed in the 

amplification process which is an enzymatic chain reaction that produces 
exponential quantities of polymorphic locus relative to the number of reaction 
steps involved. Typically, one primer is complementary to the negative (-) 
strand of the polymorphic locus and the other is complementary to the positive 

25 (+) strand. Annealing the primers to denatured nucleic acid followed by 
extension with an enzyme, such as the large fragment of DNA polymerase I 
(Klenow) and nucleotides, results in newly synthesized + and - strands 
containing the target polymorphic locus sequence. Because these newly 
synthesized sequences are also templates, repeated cycles of denaturing, primer 

30 annealing, and extension results in exponential production of the region (i.e., the 
target polymorphic locus sequence) defined by the primers. The product of the 
chain reaction is a discreet nucleic acid duplex with termini corresponding to the 
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ends of the specific primers employed. 

The oligonucleotide primers of the invention may be prepared using any 
suitable method, such as conventional phosphotriester and phosphodiester 
methods or automated embodiments thereof. In one such automated 
embodiment, diethylphosphoramidites are used as starting materials and may be 
synthesized as described by Beaucage, et al., Tetrahedron Letters, 22:1859-1862, 
1981. One method for synthesizing oligonucleotides on a modified solid support 
is described in U.S. Patent No. 4,458,066. 



The agent for polymerization may be any compound or system which will 
function to accomplish the synthesis of primer extension products, including 
enzymes. Suitable enzymes for this purpose include, for example, £. coli DNA 
polymerase I, Klenow fragment of E. coli DNA polymerase, polymerase muteins, 
reverse transcriptase, other enzymes, including heat-stable enzymes (e.L, those 
enzymes which perform primer extension after being subjected to temperatures 
sufficiently elevated to cause denaturation), such as Taq polymerase. Suitable 
enzyme will facilitate combination of the nucleotides in the proper manner to 
form the primer extension products which are complementary to each 
polymorphic locus nucleic acid strand. Generally, the synthesis will be initiated 
at the 3' end of each primer and proceed in the 5' direction along the template 
strand, until synthesis terminates, producing molecules of different lengths. 

The newly synthesized strand and its complementary nucleic acid strand 
will form a double-stranded molecule under hybridizing conditions described 
above and this hybrid is used in subsequent steps of the process. In the next step, 
the newly synthesized double-stranded molecule is subjected to denaturing 
conditions using any of the procedures described above to provide single- 
stranded molecules. 

The steps of denaturing, annealing, and extension product synthesis can be 
repeated as often as needed to amplify the target polymorphic locus nucleic acid 
sequence to the extent necessary for detection. The amount of the specific nucleic 
acid sequence produced will accumulate in an exponential fashion. 
Amplification is described in PCR. A Practical Approach . ILR Press, Eds. M. J. 
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McPherson, P. Quirke, and G. R. Taylor, 1992. 

The amplification products may be detected by Southern blots analysis, 
without using radioactive probes. In such a process, for example, a small sample 
of DNA containing a very low level of the nucleic acid sequence of the 
5 polymorphic locus is amplified, and analyzed via a Southern blotting technique 
or similarly, using dot blot analysis. The use of non-radioactive probes or labels 
is facilitated by the high level of the amplified signal. Alternatively, probes used 
to detect the amplified products can be directly or indirectly detectably labeled, for 
example, with a radioisotope, a fluorescent compound, a bioluminescent 

10 compound, a chemiluminescent compound, a metal chelator or an enzyme. 
Those of ordinary skill in the art will know of other suitable labels for binding to 
the probe, or will be able to ascertain such, using routine experimentation. 

Sequences amplified by the methods of the invention can be further 
evaluated, detected, cloned, sequenced, and the like, either in solution or after 

15 binding to a solid support, by any method usually applied to the detection of a 
specific DNA sequence such as PCR, oligomer restriction (Saiki, et.ah, 
Bio/Technology ,3:1008-1012, 1985), allele-specific oligonucleotide (ASO) probe 
analysis (Conner, et. ah, Proc. Natl. Acad. Set U.S.A., g0:278, 1983), 
oligonucleotide ligation assays (OLAs) (Landgren, et al, Science,241:10Q7, 1988), 

20 and the like. Molecular techniques for DNA analysis have been reviewed 
(Landgren, et ah, Science 1988). 

Preferably, the method of amplifying is by PCR, as described herein and as 
is commonly used by those of ordinary skill in the art. Alternative methods of 
amplification have been described and can also be employed as long as the 

25 BRCA1 locus amplified by PCR using primers of the invention is similarly 
amplified by the alternative means. Such alternative amplification systems 
include but are not limited to self-sustained sequence replication, which begins 
with a short sequence of RNA of interest and a T7 promoter. Reverse 
transcriptase copies the RNA into cDNA and degrades the RNA, followed by 

30 reverse transcriptase polymerizing a second strand of DNA. Another nucleic 
acid amplification technique is nucleic acid sequence-based amplification 
(NASBA) which uses reverse transcription and T7 RNA polymerase and 
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incorporates two primers to target its cycling scheme. NASBA can begin with 
either DNA or RNA and finish with either, and amplifies to 10 8 copies within 60 
to 90 minutes. Alternatively, nucleic acid can be amplified by ligation activated 
transcription (LAT). LAT works from a single-stranded template with a single 
5 primer that is partially single-stranded and partially double-stranded. 
Amplification is initiated by ligating a cDNA to the promoter oligonucleotide 
and within a few hours, amplification is 10 8 to 10 9 fold. Another amplification 
system useful in the method of the invention is the QB Replicase System. The 
QB replicase system can be utilized by attaching an RNA sequence called MDV-1 

10 to RNA complementary to a DNA sequence of interest. Upon mixing with a 
sample, the hybrid RNA finds its complement among the specimen's mRNAs 
and binds, activating the replicase to copy the tag-along sequence of interest. 
Another nucleic acid amplification technique, ligase chain reaction (LCR), works 
by using two differently labeled halves of a sequence of interest which are 

15 covalently bonded by ligase in the presence of the contiguous sequence in a 
sample, forming a new target. The repair chain reaction (RCR) nucleic acid 
amplification technique uses two complementary and target-specific 
oligonucleotide probe pairs, thermostable polymerase and ligase, and DNA 
nucleotides to geometrically amplify targeted sequences. A 2-base gap separates 

20 the oligonucleotide probe pairs, and the RCR fills and joins the gap, mimicking 
DNA repair. Nucleic acid amplification by strand displacement activation (SDA) 
utilizes a short primer containing a recognition site for hincll with short 
overhang on the 5' end which binds to target DNA. A DNA polymerase fills in 
the part of the primer opposite the overhang with sulfur-containing adenine 

25 analogs. Hindi is added but only cuts the unmodified DNA strand. A DNA 
polymerase that lacks 5' exonuclease activity enters at the cite of the nick and 
begins to polymerize, displacing the initial primer strand downstream and 
building a new one which serves as more primer. SDA produces greater than 
10 7 -fold amplification in 2 hours at 37°C Unlike PCR and LCR, SDA does not 

30 require instrumented Temperature cycling. 

Another method is a process for amplifying nucleic acid sequences from a 
DNA or RNA template which may be purified or may exist in a mixture of 
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nucleic acids. The resulting nucleic acid sequences may be exact copies of the 
template, or may be modified. The process has advantages over PCR in that it 
increases the fidelity of copying a specific nucleic acid sequence, and it allows one 
to more efficiently detect a particular point mutation in a single assay. A target 
5 nucleic acid is amplified enzymatically while avoiding strand displacement. 
Three primers are used. A first primer is complementary to the first end of the 
target. A second primer is complementary to the second end of the target. A 
third primer which is similar to the first end of the target and which is 
substantially complementary to at least a portion of the first primer such that 
10 when the third primer is hybridized to the first primer, the position of the third 
primer complementary to the base at the 5' end of the first primer contains a 
i modification which substantially avoids strand displacement. This method is 
J detailed in U.S. Patent 5,593,840 to Bhatnagar et aL 1997. Although PCR is the 
; preferred method of amplification if the invention, these other methods can also 
jl5 be used to amplify the BRCA1 locus as described in the method of the invention, 
y The BRCAK omi ) DNA coding sequences were obtained by end to end 

* sequencing of the BRCA1 alleles of five subjects in the manner described above 
^ followed by analysis of the data obtained. The data obtained provided us with the 
y opportunity to evaluate seven previously published polymorphisms and to 
j20 affirm or correct where necessary, the frequency of occurrence of alternative 
codons. 

GENE THERAPY 

The coding sequences can be used for gene therapy. 
A variety of methods are known for gene transfer, any of which might be 
25 available for use. 

Direct injection of Recombinant DNA in vivo 

1. Direct injection of "naked" DNA directly with a syringe and needle into a 
specific tissue, infused through a vascular bed, or transferred through a catheter 
into endothelial cells. 
30 2. Direct injection of DNA that is contained in artificially generated lipid 
vesicles. 

3. Direct injection of DNA conjugated to a targeting structure, such as an 
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antibody. 

4. Direct injection by particle bombardment, where the DNA is coated onto 
gold particles and shot into the cells. 

Human Artificial Chromosomes 

This novel gene delivery approach involves the use of human chromosomes 
that have been striped down to contain only the essential components for 
replication and the genes desired for transfer. 

Receptor-Mediated Gene Transfer 

DNA is linked to a targeting molecule that will bind to specific cell-surface 
receptors, inducing endocytosis and transfer of the DNA into mammalian cells. 
One such technique uses poly-L-lysine to link asialoglycoprotein to DNA. An 
adenovirus is also added to the complex to disrupt the lysosomes and thus allow 
the DNA to avoid degradation and move to the nucleus. Infusion of these 
particles intravenously has resulted in gene transfer into hepatocytes. 

RECOMBINANT VIRUS VECTORS 

Several vectors are used in gene therapy. Among them are the Moloney Murine 
Leukemia Virus (MoMLV) Vectors, the adenovirus vectors, the adeno- 
Associated Virus (AAV) vectors, the herpes simplex virus (HSV) vectors, the 
poxvirus vectors, and human immunodeficiency virus (HTV) vectors, 

GENE REPLACEMENT AND REPAIR 

The ideal genetic manipulation for treatment of a genetic disease would be the 
actual replacement of the defective gene with a normal copy of the gene. 
Homologous recombination is the term used for switching out a section of DNA 
and replacing it with a new piece. By this technique, the defective gene can be 
replaced with a normal gene which expresses a functioning BRCA1 tumor 
growth inhibitor protein. 

A complete description of gene therapy can also be found in "Gene Therapy A 
Primer For Physicians 2d Ed. by Kenneth W. Culver, M.D. Publ. Mary Ann 
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Liebert Inc. (1996). Two Gene Therapy Protocols for BRCA1 are approved by the 
Recombinant DNA Advisory Committee for Jeffrey T. Holt et al. They are listed 
as 9602-148, and 9603-149 and are available from the NIH. The isolated BRCA1 
gene can be synthesized or constructed from amplification products and inserted 
into a vector such as the LXSN vector. 

The BRCA1 amino acid and nucleic acid sequence may be used to make 
diagnostic probes and antibodies. Labeled diagnostic probes may be used by any 
hybridization method to determine the level of BRCA1 protein in serum or lysed 
cell suspension of a patient, or solid surface cell sample. 

The BRCA1 amino acid sequence may be used to provide a level of 
protection for patients against risk of breast or ovarian cancer or to reduce the 
size of a tumor. Methods of making and extracting proteins are well known. 
Itakura et al U.S. Patents 4,704,362, 5, 221, 619, and 5,583,013. BRCA1 has been 
shown to be secreted. Jensen, R.A. et al Nature Genetics 12: 303-308 (1996). 

EXAMPLE 1 

Determination Of The Coding Sequence Of A BRCAlte mi l Gene From Five 
Individuals 



20 MATERIALS AND METHODS 

Approximately 150 volunteers were screened in order to identify 
individuals with no cancer history in their immediate family (i.e. first and 
second degree relatives). Each person was asked to fill out a hereditary cancer 
prescreening questionnaire See TABLE I below. Five of these were randomly 

25 chosen for end-to-end sequencing of their BRCA1 gene. A first degree relative is 
a parent, sibling, or offspring. A second degree relative is an aunt, uncle, 
grandparent, grandchild, niece, nephew, or half-sibling. 
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TABLE I 

Hereditary Cancer Pre-Screening Questionnaire 

Part A: Answer the following questions about your family 

1. To your knowledge, has anyone in your family been diagnosed with a very specific 
5 hereditary colon disease called Familial Adenomatous Polyposis (FAP)? 

2. To your knowledge, have you or any aunt had breast cancer diagnosed before the age 35? 

3. Have you had Inflammatory Bowel Disease, also called Crohn's Disease or Ulcerative 
Colitis, for more than 7 years? 

Part B: Refer to the list of cancers below for your responses only to questions in Part B 
10 Bladder Cancer Lung Cancer Pancreatic Cancer 

Breast Cancer Gastric Cancer Prostate Cancer 

Colon Cancer Malignant Melanoma Renal Cancer 

Endometrial Cancer Ovarian Cancer Thyroid Cancer 

4. Have your mother or father, your sisters or brothers or your children had any of the listed 
15 cancers? 

5 Have there been diagnosed in your mother 's brothers or sisters, or your mother 's parents 

more than one of the cancers in the above list? 
6. Have there been diagnosed in your father 's brothers or sisters, or your father 's parents 

more than one of the cancers in the above list? 
20 Part C: Refer to the list of relatives below for responses only to questions in Part C 

You Your mother 

Your sisters or brothers Your mothers's sisters or brothers (maternal aunts and uncles) 
Your children Your mother's parents (maternal grandparents) 

7 Have there been diagnosed in these relatives 2 or more identical types of cancer? 
25 Do not count "simple" skin cancer, also called basal cell or squamous cell skin cancer. 

■8. Is there a total of 4 or more of any cancers in the list of relatives above other than 

"simple" skin cancers? 
Part D: Re fer to the list of relatives below for responses only to questions in Part D. 
You Your father 

30 Your sisters or brothers Your fathers's sisters or brothers (paternal aunts and uncles) 

Your children Your father's parents (paternal grandparents) 

9. Have there been diagnosed in these relatives 2 or more identical types of cancer? 
Do not count "simple" skin cancer, also called basal cell or squamous cell skin cancer. 

10. Is there a total of 4 or more of any cancers in the list of relatives above other than "simple" 
35 skin cancers? 

© Copyright 1996, OncorMed, Inc. 
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Genomic DNA was isolated from white blood cells of five subjects selected 
from analysis of their answers to the questions above. Dideoxy sequence analysis 
was performed following polymerase chain reaction amplification. 

All exons of the BRCA1 gene were subjected to direct dideoxy sequence 
5 analysis by asymmetric amplification using the polymerase chain reaction (PCR) 
to generate a single stranded product amplified from this DNA sample. 
Shuldiner, et ah, Handbook of Techniques in Endocrine Research, p. 457-486, 
DePablo,F., Scanes, C, eds., Academic Press, Inc., 1993. Fluorescent dye was 
attached for automated sequencing using the Taq Dye Terminator® Kit (Perkin- 
10 Elmer cat# 401628). DNA sequencing was performed in both forward and reverse 
directions on an Applied Biosystems, Inc. (ABI) automated Model 377® 
sequencer. The software used for analysis of the resulting data was Sequence 
Navigator® software purchased through ABI. 

15 1. Polymerase Chain Reaction (PCR) Amplification 

Genomic DNA (100 nanograms) extracted from white blood cells of five 
subjects. Each of the five samples was sequenced end to end. Each sample was 
amplified in a final volume of 25 microliters containing 1 microliter (100 
nanograms) genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 83, 

20 500 mM KC1, 1.2 mM MgCl 2 ), 2.5 microliters 10X dNTP mix (2 mM each 
nucleotide), 2.5 microliters forward primer, 2.5 microliters reverse primer, and 1 
microliter Taq polymerase (5 units), and 13 microliters of water. 

The primers in Table II, below were used to carry out amplification of the 
various sections of the BRCA1 gene samples. The primers were synthesized on 

25 an DNA/RNA Model 394® Synthesizer. 
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TABLE n 

BRCA1 PRIMERS AND SEQUENCING DATA 

SEGJD 

EXON SEQUENCE NO. MER Mq++ SIZE 



EXON2 2F 5' GAA GTT GTC ATT TTA TAA ACC TTT-3' 7 2 4 1.6 

2R 5* TGT CTT TTC TTC CCT AGT ATG T-3* 8 2 2 

EXON 3 3F 5' TCC TGA CAC AGC AGA CAT TTA-3' 9 21 1.4 



10 3R 5' TTG GAT TTT CGT TCT CAC TTA-3' 10 
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15 EXON 6 6/7F 5* CTT ATT TTA GTG TCC TTA AAA GG-3* 13 23 1.6 

6R 5' TTT CAT GGA CAG CAC TTG AGT G-3' 14 22 



30 EXON 11A11AF 5' CCA CCT CCA AGG TGT ATC A-3' 23 19 12 

11AR 5' TGT TAT GTT GGC TCC TTG CT-3' 24 20 



EXON11B11BF1 5' CAC TAA AGA CAG A AT GAA TCT A-3; 25 21 1.2 

11BR1 5' GAA GAA CCA GAA TAT TCA TCT A-3 1 2 6 21 



EXON 1 1 E 1 1 EF 5' GTA TAA GCA ATA TCG AAC TCG A-3' 31 22 1 2 

11ER 5' TTA AGT TCA CTG GTA TTT GAA CA-3' 32 23 

45 EXON 1 1 F 1 1 FF 5' GAC AGC GAT ACT TTC CCA GA-3* 33 20 1.2 

11FR 5' TCG AAC AAC CAT GAA TTA GTC-3' 34 21 



EXON 11G11GF 5* GGA AGT TAG CAC TCT AGG GA-3' 3 5 2 0 1.2 

11GR 5' GCA GTG ATA TTA ACT GTC TGT A-3* 3 6 2 2 



-275 



-375 



EXON 5 5F 5' CTC TTA AGG GCA GTT GTG AG-3' 1 1 2 0 1.2 -275 

5R 5' TTC CTA CTG TCG TTG CTT CC 12 201 



-250 



EXON 7 7F 5' CAC AAC AAA GAG CAT ACA TAG GG-3' 15 2 3 1.6 -275 

6/7R 5' TCG GGT TCA CTC TGT AGA AG-3 1 16 20 

EXON 8 8F1 5' TTC TCT TCA GGA GGA AAA GCA-3* 17 21 1.2 -270 

8R1 5* OCT Q0C TAC CAC AAA TAC AAA-3' 18 21 



EXON 9 9F 5* CCA CAG TAG ATG CTC AGT AAATA-3* 19 23 1.2 -250 

25 9R 5 l TAG GAA A AT ACC AGC TTC ATA GA-3 1 2 0 2 3 

EXOM10 10F 5' TCG TCA GCT TTC TGT AAT CG-3' 21 2 0 1.6 -250 

10R 5' GTA TCT ACC CAC TCT CTT CTT CAG-3' 22 24 



372 



-400 



EXON11C11CF1 5' TGA TCG GGA GTC TGA ATC AA-3' 27 20 1.2 -400 

11CR1 5' TCT GCT TTC TTG ATA AAA TCC T-3' 2 8 22 



EXON11D11DF1 5' AGC GTC CCC TCA CAA ATA AA-3* 29 20 1 2 -400 

40 11DR1 5' TCA AGC GCA TGA ATA TCC CT-3' 30 20 



388 



382 



423 



1 M13 tailed 
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EXON 11H11HF 5* 
11HR 5' 



TCG GTC CTT AAA 
TCA GGT GAC ATT 



GAA ACA AAGT-3* 
GAA TCX TCC-3' 



37 
38 



22 
21 



1.2 



366 



EXON 111 14 IF 
11iR 



5' CCA CTT TTT CCC 
5' TCA GGA TGC TTA 



ATC AAG TCA-3* 
CAA TTA CTT C-3' 



39 
40 



21 
21 



1.2 



377 



EXON 1 1J1 1JF 5* 
11JR 5* 



CAA AAT TGA ATG 
TCG GTA ACC CTG 



CTA TGC TTA GA-3* 41 2 3 

AGC CAA AT-3' 42 20 



1.2 



377 



10 EXON 1 1 K 1 1 KF 5' 
11KR-15 1 



GCA AAAGCG TCC 
TAT TTG CAG TCA 



AGA AAG GA-3 1 43 2 0 

AGT CTT CCA A-3' 44 22 



1 .2 



396 



15 



EXON11L11LF-1 5 1 
11LR 5' 



EXON 12 12F 
12R 



5' 
5' 



GTA ATA TTG GCA 

TAA AAT GTG CTC 

GTC CTC CCA ATG 

TGT CAG CAA ACC 



360 



AAG GCA TCT-3* 45 22 1.2 

CCC AAA AGC A-3 1 4 6 22* 

AGA AGA AA-3 1 47 20 1.2 -300 

TAA GAA TGT-3' 48 21 



20 



EXON 13 13F 
13R 



5* AAT GGA AAG CTT 
5* ATG TTG GAG CTA 



CTC AAAGTA-3' 
GGT CCT TAC-3* 



49 
50 



21 
21 



1 .2 



-325 



EXON 14 14F 
14R 



5' CTA ACC TGA ATT 
5* GTG TAT AAATGC 



ATC ACT ATC A-3* 51 22 

CTG TAT GCA-3' 5 2 21 



1.2 -310 
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EXON 15 15F 
15R 



5 s TCG CTG CCC AGG 
5' AAC CAG AAT ATC 



AAG TAT G-3 1 53 19 

TTT ATG TAG GA-3' 54 23 



1.2 



-375 



30 



EXON 16 16F 
16R 

EXON 17 17F 
17R 



5 1 AAT TCT TAA CAG 

5« AAA ACT CTT TCC 

5* GTC TAG AAC GTG 

5' TCG CCT CAT GTC 



-550 



AGA CCA GAA C-3 1 5 5 22 1.6 

AGA ATG TTG T-3* 56 22 

CAG GAT TG-3* 57 20 1.2 -275 

GTT TTA-3' 58 18 
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EXON 18 18F 
18R 



5' GGC TCT TTA GOT 
5' GAG ACC ATT TTC 



TCT TAG GAC-3* 
CCA GCA TC-3* 



59 
60 



21 
20 



1.2 -350 



40 



EXON 19 19F 
19R 

EXON 20 20F 
20R 



5' CTG TCA TTC TTC 

5' CAT TGT TAA GGA 

5* ATA TGA CGT GTC 

5' GGG AAT CCA AAT 



CTG TGC TC-3* 

AAG TCG TGC-3* 

TGC TCC AC-3' 

TAC ACA GC-3' 



61 
62 

63 
64 



20 
21 

20 
20 



1 .2 



1.2 



-250 



-425 



45 



EXON 21 21 F 
21R 

EXON 22 22F 
22R 



5' AAG CTC TTC CTT 

5' GTA GAG AAA TAG 

5' TCC CAT TGA GAG 

5' GAG AAG ACT TCT 



TTT GAA AGT C-3' 65 22 1.6 -300 

AAT AGC CTC T-3* 66 22 

GTC TTG CT-3' 67 20 1.6 -300 

GAG GCT AC-3' 68 20 



EXON 23 23F-1 5' 
50 23R-1 5* 



TGA AGT GAC AGT 
CAT TTT AGC CAT 



TCC AGT AGT-3' 6 9 21 

TCA TTC AAC AA-3' 70 2 3 



1.2 -250 



EXON 24 24F 
24R 



5' ATG AAT TGA CAC 
5' GTA GCC AGG ACA 



TAA TCT CTG C-3' 71 2 2 

GTA GAA GGA-3' 7 2 21 



1.4 -285 
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Thirty-five cycles were performed, each consisting of denaturing (95 °C; 30 
seconds), annealing (55°C; 1 minute), and extension (72°C; 90 seconds), except during 
the first cycle in which the denaturing time was increased to 5 minutes, and during the 
last cycle in which the extension time was increased to 5 minutes. 
5 PCR products were purified using Qia-quick® PCR purification kits (Qiagen cat# 

28104; Chats worth, CA). Yield and purity of the PCR product determined 
spectrophotometrically at OD 26 q on a Beckman DU 650 spectrophotometer. 

2. Dideoxy Sequence Analysis 

Fluorescent dye was attached to PCR products for automated sequencing using the Taq 
10 Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing was performed in 
both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, 
CA., automated Model 377® sequencer. The software used for analysis of the resulting 
data was "Sequence Navigator® software" purchased through ABL 

3. RESULTS 

15 Differences in the nucleic acids of the ten alleles from five individuals were found in 
seven locations on the gene. The changes and their positions are found on TABLE HI, 
below. 
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TABLE III 
PANEL TYPING 

AMINO Nucleotide 

ACID CHANGE CHANGE 1 2 3 4 5 FREQUENCY 



SER(SER) 11E C/C C/T C/T T/T T/T 0.4 C 

( 694 ) 0.6 T 

LEU(LEU) 11F T/T C/T C/T C/C C/C 0.4 T 

< 771 > 0.6 C 

PRO(LEU) 11G C/T C/T C/T T/T T/T 0.3 C 

( 871 > ' 0.7 T 

GLU(GLY) 111 A/ A A/G A/G G/G G/G 0.4 A 

(1038) o.6 G 

LYS(ARG) 1 1 J A/A A/G A/G G/G G/G 0.4 A 

C183) 06G 

SER(SER) 1 3 T/T T/T T/C C/C C/C 0.5 T 

(1436) 0 5 c 

SER(GLY) 16 A/ A A/G A/G G/G G/G 0.4 A 

(1613) 0.6G 
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Tables 3 and 4 depict one aspect of the invention, sets of at least two alternative 
codon pairs wherein the codon pairs occur in the following frequencies, 
respectively, in a population of individuals free of disease: 

at position 2201, AGC and AGT occur at frequencies from about 35- 
45%, and from about 55-65%, respectively; 

at position 2430, TTG and CTG occur at frequencies from about 35- 
45%, and from about 55-65%, respectively; 

at position 2731, CCG and CTG occur at frequencies from about 25- 
35%, and from about 65-75%, respectively; 

at position 3232, GAA and GGA occur at frequencies from about 35- 
45%, and from about 55-65%, respectively; 

at position 3667, AAA and AGA occur at frequencies from about 35- 
45%, and from about 55-65%, respectively; 

at position 4427, TCT and TCC occur at frequencies from about 45- 
55%, and from about 45-55%, respectively; and 

at position 4956, AGT and GGT occur at frequencies from about 35- 
45%, and from about 55-65%, respectively. 



The data show that for each of the samples. The BRCA1 gene is identical except 
20 in the region of seven polymorphisms. These polymorphic regions, together with their 
locations, the amino acid groups of each codon, the frequency of their occurrence and 
the amino acid coded for by each codon are found in TABLE IV below. 

TABLE IV 

25 CODON AND BASE CHANGES EST SEVEN POLYMORPHIC SITES OF BRCA1 GENE 

SAMPLE BASE ■ POSITION CCDCN AA PUBLISHED FREQUENCY 

NAME CHANGE nt/aa EXCN CHANGE CHANGE FREQUENCY 2 IN THIS STUDY 



30 



2,3,4,5 



2,3,4,5 



C-T 



T-C 



2201/694 11E 



2430/771 11F 



AGC(AGT) SER-SER UNPUBUSHED 



TTG(CTG) LEU-LEU T=67% 1 3 



C=40% 



T=40% 



1,2,3,4,5 C-T 



2731/871 11G CCG(CTG) PRO-LEU C=34% 12 



C=30% 



30 



2,3,4,5 A-G 3232/1038 111 GAA(GGA) GLU-GLY A=67% 13 A = 40% 

2,3,4,5 A-G 3667/1 183 1 1J AAA(AGA) LYS-ARG A=68% 12 A=40% 

5 3,4,5 T-C 4427/1436 13 TCT(TCC) SER-SER T=67% 12 T=50% 

2,3,4,5 A-G 4956/1613 16 AGT(GGT) SER-GLY A=67% 12 A=40% 



2 Reference numbers correspond to the Table of References below. 

10 
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EXAMPLE 2 

Determination Of A Individual Using BRCAll QMI l And The Seven Polymorphisms 
For Reference 

A person skilled in the art of genetic susceptibility testing will find the present 
5 invention useful for: 

a) identifying individuals having a BRCA1 gene, who are therefore have no 
elevated genetic susceptibility to breast or ovarian cancer from a BRCA1 
mutation; 

b) avoiding misinterpretation of polymorphisms found in the 
10 BRCA1 gene; 

Sequencing is carried out as in EXAMPLE 1 using a blood sample from the patient in 
question. However, a BRCAl(° mi ) sequence is used for reference and the polymorphic 
3 sites are compared to the nucleic acid sequences listed above for codons at each 
polymorphic site. A sample is one which compares to a BRCAl(° mi > sequence and 
3.5 contains one of the base variations which occur at each of the polymorphic sites. The 
J codons which occur at each of the polymorphic sites are paired here reference. 

♦ AGC and AGT at position 2201, 

♦ TTG and CTG at position 2430, 
^ • CCG and CTG at position 2731, 
520 • G AA and GGA at position 3232, 

• AAA and AGA at position 3667, 
TCT and TCC at position 4427, and 

• AGT and GGT at position 4956. 

The availability of these polymorphic pairs provides added assurance that one skilled in 
25 the art can correctly interpret the polymorphic variations without mistaking a 
variation for a mutation. 

Exon 11 of the BRCA1 gene is subjected to direct dideoxy sequence analysis by 
asymmetric amplification using the polymerase chain reaction (PCR) to generate a 
single stranded product amplified from this DNA sample. Shuldiner, et al, Handbook 
30 of Techniques in Endocrine Research, p. 457-486, DePablo,F., Scanes, C., eds., Academic 
Press, Inc., 1993. Fluorescent dye is attached for automated sequencing using the Taq 
Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in 
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both forward and reverse directions on an Applied Biosystems, Inc. (ABI) automated 
Model 377® sequencer. The software used for analysis of the resulting data is "Sequence 
Navigator® software" purchased through ABI. 

1- Polymerase Chain Reaction (PCR) Amplification 

Genomic DNA (100 nanograms) extracted from white blood cells of the subject is 
amplified in a final volume of 25 microliters containing 1 microliter (100 nanograms) 
genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 8.3, 500 mM KC1, 1.2 
mM MgCl 2 ), 2.5 microliters 10X dNTP mix (2 mM each nucleotide), 2.5 microliters 
forward primer (BRCA1-11K-F, 10 micromolar solution), 2.5 microliters reverse primer 
(BRCA1-11K-R, 10 micromolar solution),and 1 microliter Taq polymerase (5 units), and 
13 microliters of water. 

The PCR primers used to amplify a patient's sample BRCA1 gene are listed in Table II. 
The primers were synthesized on an DNA/RNA Model 394® Synthesizer. Thirty-five 
cycles are of amplification are performed, each consisting of denaturing (95 °C; 30 
seconds), annealing (55°C; 1 minute), and extension (72°C; 90 seconds), except during 
the first cycle in which the denaturing time is increased to 5 minutes, and during the 
last cycle in which the extension time is increased to 5 minutes. 

PCR products are purified using Qia-quick® PCR purification kits (Qiagen, cat# 28104; 
Chatsworth, CA). Yield and purity of the PCR product determined 
spectrophotometrically at OD 260 on a Beckman DU 650 spectrophotometer. 

2. Dideoxy Sequence Analysis 

Fluorescent dye is attached to PCR products for automated sequencing using the Taq 
Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in 
both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, 
CA., automated Model 377® sequencer. The software used for analysis of the resulting 
data is "Sequence Navigator® software" purchased through ABI. The BRCAK omil ) SEQ. 
ID. NO.:l sequence is entered into the Sequence Navigator® software as the Standard for 
comparison. The Sequence Navigator® software compares the sample sequence to the 
BRCAl(° m11 ) SEQ. ID. NO.:l standard, base by base. The Sequence Navigator® software 
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highlights all differences between the BRCAK omil > SEQ. ID. NO.:l DNA sequence and 
the patient's sample sequence. 

A first technologist checks the computerized results by comparing visually the 
BRCAl(° mil ) SEQ. ID. NO.:l standard against the patient's sample, and again highlights 
5 any differences between the standard and the sample. The first primary technologist 
then interprets the sequence variations at each position along the sequence. 
Chromatograms from each sequence variation are generated by the Sequence 
Navigator® software and printed on a color printer. The peaks are interpreted by the 
first primary technologist and a second primary technologist. A secondary technologist 
10 then reviews the chromatograms. The results are finally interpreted by a geneticist. In 
each instance, a variation is compared to known polymorphisms for position and base 
change. If the sample BRCA1 sequence matches the BRCAl(° mil ) SEQ. ID. NO.:l 
standard, with only variations within the known list of polymorphisms, it is 
interpreted as a gene sequence. 

15 

EXAMPLE 3 

DETERMINING THE ABSENCE OF A MUTATION IN THE BRCA1 GENE USING 
BRCAlismill AND SEVEN POLYMORPHISMS FOR REFERENCE 

A person skilled in the art of genetic susceptibility testing will find the present 
20 invention useful for determining the presence of a known or previously unknown 
mutation in the BRCA1 gene. A list of mutations of BRCA1 is publicly available in the 
Breast Cancer Information Core at: 

http://www.nchgr.nih.gov/dir/lab_transfer/bic. This data site became publicly 
available on November 1, 1995. Friend, S. et al Nature Genetics 11:238, (1995). 

25 Sequencing is carried out as in EXAMPLE 1 using a blood sample from the patient in 
question. However, a BRCAl(° mi > sequence is used for reference and polymorphic sites 
are compared to the nucleic acid sequences listed above for codons at each polymorphic 
site. A sample is one which compares to the BRCAl( omi2 ) SEQ. ID. NO.: 3 sequence and 
contains one of the base variations which occur at each of the polymorphic sites. The 

30 codons which occur at each of the polymorphic sites are paired here reference. 
• AGC and AGT at position 2201, 
TTG and CTG at position 2430, 
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• CCG and CTG at position 2731, 

• GAA and GGA at position 3232, 

• AAA and AGA at position 3667, 
TCT and TCC at position 4427, and 

5 • AGT and GGT at position 4956. 

The availability of these polymorphic pairs provides added assurance that one skilled in 
the art can correctly interpret the polymorphic variations without mistaking a 
variation for a mutation. 

Exon 11 of the BRCA1 gene is subjected to direct dideoxy sequence analysis by 

10 asymmetric amplification using the polymerase chain reaction (PCR) to generate a 
single stranded product amplified from this DNA sample. Shuldiner, et al, Handbook 
of Techniques in Endocrine Research, p. 457-486, DePablo,R, Scanes, C., eds., Academic 
Press, Inc., 1993. Fluorescent dye is attached for automated sequencing using the Taq 
Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in 

15 both forward and reverse directions on an Applied Biosystems, Inc. (ABI) automated 
Model 377® sequencer. The software used for analysis of the resulting data is "Sequence 
Navigator® software" purchased through ABL 

1. Polymerase Chain Reaction (PCR) Amplification 

20 Genomic DNA (100 nanograms) extracted from white blood cells of the subject is 
amplified in a final volume of 25 microliters containing 1 microliter (100 nanograms) 
genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 8.3, 500 miM KC1, 1.2 
mM MgCl 2 ), 2.5 microliters 10X dNTP mix (2 mM each nucleotide), 2.5 microliters 
forward primer (BRCA1-11K-F, 10 micromolar solution), 2.5 microliters reverse primer 

25 (BRCA1-11K-R, 10 micromolar solution),and 1 microliter Taq polymerase (5 units), and 
13 microliters of water. 

The PCR primers used to amplify a patient's sample BRCA1 gene are listed in Table IL 
The primers were synthesized on an DNA/RNA Model 394® Synthesizer. Thirty-five 
cycles are of amplification are performed, each consisting of denaturing (95°C; 30 
30 seconds), annealing (55°C; 1 minute), and extension (72°C; 90 seconds), except during 
the first cycle in which the denaturing time is increased to 5 minutes, and during the 
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last cycle in which the extension time is increased to 5 minutes. 

PCR products are purified using Qia-quick® PCR purification kits (Qiagen, cat# 
28104; Chatsworth, CA). Yield and purity of the PCR product determined 
spectrophotometrically at OD 260 on a Beckman DU 650 spectrophotometer. 

2. Dideoxy Sequence Analysis 

Fluorescent dye is attached to PCR products for automated sequencing using the Taq 
Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in 
both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, 
CA., automated Model 377® sequencer. The software used for analysis of the resulting 
data is "Sequence Navigator® software" purchased through ABI. The BRCAK omi2 ) SEQ. 
ID. NO.: 3 sequence is entered into the Sequence Navigator® software as the Standard 
for comparison. The Sequence Navigator® software compares the sample sequence to 
the BRCAl(° mi2 > SEQ. ID. NO.: 3 standard, base by base. The Sequence Navigator® 
software highlights all differences between the BRCAK omi2 ) SEQ. ID. NO.: 3 DNA 
sequence and the patient's sample sequence. 

A first technologist checks the computerized results by comparing visually the 
BRCAl( omi2 ) SEQ. ID. NO.: 3 standard against the patient's sample, and again highlights 
any differences between the standard and the sample. The first primary technologist 
then interprets the sequence variations at each position along the sequence. 
Chromatograms from each sequence variation are generated by the Sequence 
Navigator® software and printed on a color printer. The peaks are interpreted by the 
first primary technologist and also by a second primary technologist. A secondary 
technologist then reviews the chromatograms. The results are finally interpreted by a 
geneticist. In each instance, a variation is compared to known polymorphisms for 
position and base change. If the sample BRCA1 sequence matches the BRC Al < omi2 > SEQ. 
ID. NO.: 3 standard, with only variations within the known list of polymorphisms, it is 
interpreted as a gene sequence. 
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EXAMPLE 4 

DETERMINING THE PRESENCE OF A MUTATION IN THE BRCA1 GENE USING 
BRCAligmil AND SEVEN POLYMORPHISMS FOR REFERENCE 

A person skilled in the art of genetic susceptibility testing will find the present 
5 invention useful for determining the presence of a known or previously unknown 
mutation in the BRCA1 gene. A list of mutations of BRCA1 is publicly available in the 
Breast Cancer Information Core at: 

http://www.nchgr.nih.gov/dir/lab_transfer/bic. This data site became publicly 
available on November 1, 1995. Friend, S. et al Nature Genetics 11:238, (1995). In this 

10 example, a mutation in exon 11 is characterized by amplifying the region of the 
mutation with a primer which matches the region of the mutation. 
Exon 11 of the BRCA1 gene is subjected to direct dideoxy sequence analysis by 
asymmetric amplification using the polymerase chain reaction (PCR) to generate a 
single stranded product amplified from this DNA sample. Shuldiner, et ah, Handbook 

15 of Techniques in Endocrine Research, p. 457-486, DePablo,F., Scanes, C, eds., Academic 
Press, Inc., 1993. Fluorescent dye is attached for automated sequencing using the Taq 
Dye Terminator® Kit (Per kin-Elmer cat# 401628). DNA sequencing is performed in 
both forward and reverse directions on an Applied Biosystems, Inc. (ABI) automated 
Model 377® sequencer. The software used for analysis of the resulting data is "Sequence 

20 Navigator® software" purchased through ABL 

1. Polymerase Chain Reaction (PCR) Amplification 

Genomic DNA (100 nanograms) extracted from white blood cells of the subject is 
amplified in a final volume of 25 microliters containing 1 microliter (100 nanograms) 
25 genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 8.3, 500 mM KC1, 1.2 
mM MgCl 2 ), 2.5 microliters 10X dNTP mix (2 mM each nucleotide), 2.5 microliters 
forward primer (BRCA1-11K-F, 10 micromolar solution), 2.5 microliters reverse primer 
(BRCA1-11K-R, 10 micromolar solution),and 1 microliter Taq polymerase (5 units), and 
13 microliters of water. 

30 The PCR primers used to amplify segment K of exon 11 (where the mutation is found) 
are as follows: 
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BRCA1-11K-F: 5'-GCA AAA GCG TCC AGA AAG GA-3' SEQ ID NO:69 
BRCA1-11K-R: 5'-AGT CTT CCA ATT CAC TGC AC-3' SEQ ID NO:70 
The primers are synthesized on an DNA/RNA Model 394® Synthesizer. 
Thirty-five cycles are performed, each consisting of denaturing (95°C; 30 seconds), 
5 annealing (55°C; 1 minute), and extension (72°C; 90 seconds), except during the first 
cycle in which the denaturing time is increased to 5 minutes, and during the last cycle 
in which the extension time is increased to 5 minutes* 

PCR products are purified using Qia-quick® PCR purification kits (Qiagen, cat# 28104; 
Chatsworth, CA). Yield and purity of the PCR product determined 
10 spectrophotometrically at OD 2 6Q on a Beckman DU 650 spectrophotometer. 

2. Dideoxy Sequence Analysis 

Fluorescent dye is attached to PCR products for automated sequencing using the Taq 
Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in 

15 both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, 
CA., automated Model 377® sequencer. The software used for analysis of the resulting 
data is "Sequence Navigator® software" purchased through ABL The BRCAl( omi2 ) SEQ. 
ID. NO.: 3 sequence is entered into the Sequence Navigator® software as the Standard 
for comparison. The Sequence Navigator® software compares the sample sequence to 

20 the BRCAK omi2 > SEQ. ID. NO.: 3 standard, base by base. The Sequence Navigator® 
software highlights all differences between the BRCAK omi2 ) SEQ. ID. NO.: 3 DNA 
sequence and the patient's sample sequence. 

A first technologist checks the computerized results by comparing visually the 
BRCAl( omi2 ) SEQ. ID. NO.: 3 standard against the patient's sample, and again highlights 

25 any differences between the standard and the sample. The first primary technologist 
then interprets the sequence variations at each position along the sequence. 
Chromatograms from each sequence variation are generated by the Sequence 
Navigator® software and printed on a color printer. The peaks are interpreted by the 
first primary technologist and a second primary technologist. A secondary technologist 

30 then reviews the chromatograms. The results are finally interpreted by a geneticist. In 
each instance, a variation is compared to known polymorphisms for position and base 
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change. Mutations are noted by the length of non-matching variation. Such a lengthy 
mismatch pattern occurs with deletions and substitutions. 

3. Result 

5 Using the above PGR amplification and standard fluorescent sequencing 

technology, The 3888delGA mutation may be found. The 3888delGA mutation The 
3RCA1 gene lies in segment "K" of exon 11. The DNA sequence results demonstrate 
the presence of a two base pair deletion at nucleotides 3888 and 3889 of the published 
BRCAl(° mi ) sequence. This mutation interrupts the reading frame of the BRCA1 
10 transcript, resulting in the appearance of an in-frame terminator (TAG) at codon 
position 1265. This mutation is, therefore, predicted to result in a truncated, and most 
likely, non-functional protein. The formal name of the mutation will be 3888delGA. 
This mutation is named in accordance with the suggested nomenclature for naming 
mutations, Baudet, A et al, Human Mutation 2:245-248, (1993). 

15 

EXAMPLE 5 

USE OF THE BRCAl& su ^ GENE THERAPY 

The growth of ovarian, breast or prostate cancer can be arrested by increasing the 
20 expression of the BRCA1 gene where inadequate expression of that gene is responsible 
for hereditary ovarian, breast and prostate cancer. It has been demonstrated that 
transfection of BRCA1 into cancer cells inhibits their growth and reduces 
tumorigenesis. Gene therapy is performed on a patient to reduce the size of a tumor. 
The LXSN vector is transformed with any of the BRCAl(° mil ) SEQ. ID. NO.:l, 
25 BRCAl( omi2 ) SEQ. ID. NO.:3, or BRCAK omi3 ) SEQ. ID. NO.:5 coding region. 

Vector 

The LXSN vector is transformed with wildtype BRCAl(° mil > SEQ. ID. NO.:l coding 
sequence. The LXSN-BRCAl(° mil ) retroviral expression vector is constructed by cloning 
30 a SaZI-linkered BRCAK omil ) cDNA (nucleotides 1-5711) into the Xhol site of the vector 
LXSN. Constructs are confirmed by DNA sequencing. Holt et al. Nature Genetics 12 : 
298-302 (1996). 
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Retroviral vectors are manufactured from viral producer cells using serum free and 
phenol-red free conditions and tested for sterility, absence of specific pathogens, and 
absence of replication-competent retrovirus by standard assays. Retrovirus is stored 
frozen in aliquots which have been tested. 

Patients receive a complete physical exam, blood, and urine tests to determine 
overall health. They may also have a chest X-ray, electrocardiogram, and appropriate 
radiologic procedures to assess tumor stage. 

Patients with metastatic ovarian cancer are treated with retroviral gene therapy by 
infusion of recombinant LX3N-BRCAl(° mil ) retroviral vectors into peritoneal sites 
containing tumor, between 10 9 and 10™ viral particles per dose. Blood samples are 
drawn each day and tested for the presence of retroviral vector by sensitive polymerase 
chain reaction (PCR)-based assays. The fluid which is removed is analyzed to 
determine: 



1. The percentage of cancer cells which are taking up the recombinant LXSN- 
BRCAK OI *ii) retroviral vector combination. Successful transfer of BRCA1 gene into 
cancer cells is shown by both RT-PCR analysis and in situ hybridization. 

RT-PCR is performed with by the method of Thompson et al. Nature Genetics 9: 444- 
450 (1995), using primers derived from BRCAK omil ) SEQ. ID. NO.:l. Cell lysates are 
prepared and immunoblotting is performed by the method of Jensen et al. Nature 
Genetics 12: 303-308 1996) and Jensen et al. Biochemistry 21: 10887-10892 (1992). 

2. Presence of programmed cell death using ApoTAG® in situ apoptosis detection kit 
(Oncor, Inc., Gaithersburg, Maryland) and DNA analysis. 

3. Measurement of BRCA I gene expression by slide immunofluorescence or western 
blot. 

Patients with measurable disease are also evaluated for a clinical response to LXSN- 
BRCAI, especially those that do not undergo a palliative intervention immediately after 
retroviral vector therapy. Fluid cytology, abdominal girth, CT scans of the abdomen, 
and local symptoms are followed. 
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For other sites of disease, conventional response criteria are used as follows: 

1. Complete Response (CR), complete disappearance of all measurable lesions 
and of all signs and symptoms of disease for at least 4 weeks. 

2. Partial Response (PR), decrease of at least 50% of the sum of the products of the 
2 largest perpendicular diameters of all measurable lesions as determined by 2 
observations not less than 4 weeks apart. To be considered a PR, no new lesions should 
have appeared during this period and none should have increased in size. 

3. Stable Disease, less than 25% change in tumor volume from previous 
evaluations. 

4. Progressive Disease, greater than 25% increase in tumor measurements from 
prior evaluations. 

The number of doses depends upon the response to treatment. 

For further information related to this gene therpay approach see in "BRCA1 
Retroviral Gene Therapy for Ovarian Cancer" a Human Gene Transfer Protocol: NTH 
ORDA Registration #: 9603-149 Jeffrey Holt, JT, M.D. and Carlos L. Arteaga, M.D. 
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"Breast and Ovarian cancer" is understood by those skilled in the art to 
include breast and ovarian cancer in women and also breast and prostate cancer in men. 
BRCA1 is associated genetic susceptibility to inherited breast and ovarian cancer in 
women and also breast and prostate cancer in men. Therefore, claims in this document 
which recite breast and/or ovarian cancer refer to breast, ovarian and prostate cancers in 
men and women. Although the invention has been described with reference to the 
presently preferred embodiments, it should be understood that various modifications 
can be made without departing from the spirit of the invention. Accordingly, the 
invention is limited only by the following claims. 



42 



SEQUENCE LISTING 



( 1 > GENERAL INFORMATION : 



(i) APPLICANT: Murphy, Patricia D. 

Allen, Antonette C. 
Alvares, Christopher P. 
Critz, Brenda S. 
Olson, Sheri J. 
Schelter, Denise B. 
Zeng, Bin 



(ii) TITLE OF INVENTION: A Sequence of the Human BRCA1 Gene 

[iii) NUMBER OF SEQUENCES: 78 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: ONCORMED 

(B) STREET: 2 00 PERRY PARKWAY 

(C) CITY: GAITHERSBURG 

(D) STATE: MD 

(E) COUNTRY: USA 

(F) ZIP: 20877 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: to be assigned 

(B) FILING DATE: herewith 

( C ) CLASSIFICATION : 



(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: R. THOMAS GALLEGOS 

(B) REGISTRATION NUMBER: 3 2,692 
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(C) REFERENCE / DOCKET NUMBER: PA- 00 54 



<ix) TELECOMMUNICATION INFORMATION t - 
(A) TELEPHONE: 301-527-2051 
<B) TELEFAX: 301-208-6997 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 5711 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCAl 

POSITION IN GENOME: 

(A) CHROMOSOME/ SEGMENT: 17 

(B) MAP POSITION: 17q21 



(viii) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 60 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 12 0 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 18 0 

TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 2 40 

ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGT 300 

44 



GTCCTTTATG TAAGAATGAT ATAACCAAAA GGAGCCTACA AGAAAGTACG AGATTTAGTC 3 60 

AACTTGTTGA AGAGCTATTG AAAATCATTT GTGCTTTTCA GCTTGACACA GGTTTGGAGT 42 0 

ATGCAAACAG CTATAATTTT GCAAAAAAGG AAAATAACTC TCCTGAACAT CTAAAAGATG 480 

AAGTTTCTAT CATCCAAAGT ATGGGCTACA GAAACCGTGC CAAAAGACTT CTACAGAGTG 54 0 

AACCCGAAAA TCCTTCCTTG CAGGAAACCA GTCTCAGTGT CCAACTCTCT AACCTTGGAA 600 

CTGTGAGAAC TCTGAGGACA AAGCAGCGGA TACAACCTCA AAAGACGTCT GTCTACATTG 66 0 

AATTGGGATC TGATTCTTCT GAAGATACCG TTAATAAGGC AACTTATTGC AGTGTGGGAG 720 

ATCAAGAATT GTTACAAATC ACCCCTCAAG GAACCAGGGA TGAAATCAGT TTGGATTCTG 780 

}■ ilAAAAAAGGC TGCTTGTGAA TTTTCTGAGA CGGATGTAAC AAATACTGAA CATCATCAAC 840 

: ,eCAGTAATAA TGATTTGAAC ACCACTGAGA AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 90 0 

gATCAGGGTAG TTCTGTTTCA AACTTGCATG TGGAGCCATG TGGCACAAAT ACTCATGCCA 960 

IGCTCATTACA GCATGAGAAC AGCAGTTTAT TACTCACTAA AGACAGAATG AATGTAGAAA 102 0 

j^GGCTGAATT CTGTAATAAA AGCAAACAGC CTGGCTTAGC AAGGAGCCAA CATAACAGAT 108 0 

X§GG CTGGAAG TAAGGAAACA TGTAATGATA GGCGGACTCC CAGCACAGAA AAAAAGGTAG 114 0 

ATCTGAATGC TGATCCCCTG TGTGAGAGAA AAGAATGGAA TAAGCAGAAA CTGCCATGCT 12 0 0 

CAGAGAATCC TAGAGATACT GAAGATGTTC CTTGGATAAC ACTAAATAGC AGCATTCAGA 12 60 

AAGTTAATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG TTCTGATGAC TCACATGATG 13 2 0 

GGGAGTCTGA ATCAAATGCC AAAGTAGCTG ATGTATTGGA CGTTCTAAAT GAGGTAGATG 13 8 0 

AATATTCTGG TTCTTCAGAG AAAAT AG AC T TACTGGCCAG TGATCCTCAT GAGGCTTTAA 1440 

TATGTAAAAG TGAAAGAGTT CACTCCAAAT CAGTAGAGAG TAATATTGAA GAC AAAAT AT 150 0 
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TTGGGAAAAC CTATCGGAAG AAGGCAAGCC TCCCCAACTT AAGCCATGTA ACTGAAAATC 1560 

TAATTATAGG AGCATTTGTT ACTGAGCCAC AGATAATACA AGAGCGTCCC CTCACAAATA 162 0 

AATTAAAGCG TAAAAGGAGA CCTACATCAG GCCTTCATCC TGAGGATTTT ATCAAGAAAG 1680 

CAGATTTGGC AGTTCAAAAG ACTCCTGAAA TGATAAATCA GGGAACTAAC CAAACGGAGC 17 4 0 

AGAATGGTCA AGTGATGAAT ATTACTAATA GTGGTCATGA GAATAAAACA AAAGGTGATT 1800 

CTATTCAGAA TGAGAAAAAT CCTAACCCAA TAGAATCACT CGAAAAAGAA TCTGCTTTCA 1860 

AAACGAAAGC TGAACCTATA AGCAGCAGTA TAAGCAATAT GGAACTCGAA TTAAATATCC 192 0 

ACAATTCAAA AGCACCTAAA AAGAATAGGC TGAGGAGGAA GTCTTCTACC AGGCATATTC 19 8 0 

ljATGCGCTTGA ACTAGTAGTC AGTAGAAATC TAAGCCCACC TAATTGTACT GAATTGCAAA 2040 

: = "TTGATAGTTG TTCTAGCAGT GAAGAGATAA AGAAAAAAAA GTACAACCAA ATGCCAGTCA 2100 

-feGCACAGCAG AAACCTACAA CTCATGGAAG GTAAAGAACC TGCAACTGGA GCCAAGAAGA 2160 

_ ^GTAACAAGCC AAATGAACAG ACAAGTAAAA GACATGACAG TGATACTTTC CCAGAGCTGA 222 0 

; ^AGTTAACAAA TGCACCTGGT TCTTTTACTA AGTGTTCAAA TACCAGTGAA CTTAAAGAAT 2280 

3?TGTCAATCC TAGCCTTCCA AGAGAAGAAA AAGAAGAGAA ACTAGAAACA GTTAAAGTGT 2340 

CTAATAATGC TGAAGACCCC AAAGATCTCA TGTTAAGTGG AGAAAGGGTT TTGCAAACTG 24 0 0 

AAAGATCTGT AGAGAGTAGC AGTATTTCAC TGGTACCTGG TACTGATTAT GGCACTCAGG 2 4 60 

AAAGTATCTC GTTACTGGAA GTTAGCACTC TAGGGAAGGC AAAAACAGAA CCAAATAAAT 2520 

GTGTGAGTCA GTGTGCAGCA TTTGAAAACC CCAAGGG ACT AATTCATGGT TGTTCCAAAG 2 58 0 

ATAATAGAAA TGACACAGAA GGCTTTAAGT ATCCATTGGG ACATGAAGTT AACCACAGTC 2 640 

GGGAAACAAG CATAGAAATG GAAGAAAGTG AACTTGATGC TCAGTATTTG CAGAATACAT 27 0 0 
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TCAAGGTTTC AAAGCGCCAG TCATTTGCTC TGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 27 60 

AATGTGCAAC ATTCTCTGCC CACTCTGGGT CCTTAAAGAA ACAAAGTCCA AAAGTCACTT 2 82 0 

TTGAATGTGA ACAAAAGGAA GAAAATCAAG GAAAGAATGA GTCTAATATC AAGCCTGTAC 2880 

AGACAGTTAA TATCACTGCA GGCTTTCCTG TGGTTGGTCA GAAAGATAAG CCAGTTGATA 2940 

ATGCCAAATG TAGTATCAAA GGAGGCTCTA GGTTTTGTCT ATCATCTCAG TTCAGAGGCA 3 000 

ACGAAACTGG AC TC ATT ACT CCAAATAAAC ATGGACTTTT ACAAAACCCA TATCGTATAC 3 0 60 

CACCACTTTT TCCCATCAAG TCATTTGTTA AAAC TAAATG TAAGAAAAAT CTGCTAGAGG 312Q 

AAAAC TTTG A GGAACATTCA ATGTCACCTG AAAGAGAAAT GGGAAATGAG AACATTCCAA 3180 

GTACAGTGAG CACAATTAGC CGTAATAACA TTAGAGAAAA TGTTTTTAAA GGAGCCAGCT 3240 

CAAGCAATAT TAATGAAGTA GGTTCCAGTA CTAATGAAGT GGGCTCCAGT ATTAATGAAA 3300 

\ TAGGTTCCAG TGATGAAAAC ATTCAAGCAG AACTAGGTAG AAACAGAGGG CCAAAATTGA 33 60 

lATGCTATGCT TAGATTAGGG GTTTTGCAAC CTGAGGTCTA TAAACAAAGT CTTCCTGGAA 3 420 

" ; GT AATTGTAA GCATCCTGAA ATAAAAAAGC AAGAATATGA AGAAGTAGTT CAGACTGTTA 3 43 0 

5ATACAGATTT CTCTCCATAT CTGATTTCAG ATAACTTAGA ACAGCCTATG GGAAGTAGTC 3 54 0 

ATGCATCTCA GGTTTGTTCT GAGACACCTG ATGACCTGTT AGATGATGGT GAAATAAAGG 3 600 

AAGAT AC TAG TTTTGCTGAA AATGACATTA AGGAAAGTTC TGCTGTTTTT AGCAAAAGCG 3 660 

TC C AGAGAGG AG AGCT TAGC AGGAGTCCTA GCCCTTTCAC CCATACACAT TTGGCTCAGG 372 0 

GTTACCGAAG AGGGGCCAAG AAATTAGAGT CCTCAGAAGA GAACTTATCT AGTGAGGATG 37 8 0 

AAGAGCTTCC CTGCTTCCAA CACTTGTTAT TTGGTAAAGT AAACAATATA CCTTCTCAGT 3 84 0 

CTACTAGGCA TAGCACCGTT GCTACCGAGT GTCTGTCTAA GAACACAGAG GAGAATTTAT 3 90Q 

47 ! 

t 
If 



TATCATTGAA GAATAGCTTA AATGACTGCA GTAACCAGGT AATATTGGCA AAGGCATCTC 3 9 60 

AGGAACATCA CCTTAGTGAG GAAACAAAAT GTTCTGCTAG CTTGTTTTCT TCACAGTGCA 402 0 

GTGAATTGGA AGACTTGACT GCAAATACAA ACACCCAGGA TCCTTTCTTG ATTGGTTCTT 4 080 

CCAAACAAAT GAGGCATCAG TCTGAAAGCC AGGGAGTTGG TCTGAGTGAC AAGGAATTGG 4140 

TTTCAGATGA TGAAGAAAGA GGAACGGGCT TGGAAGAAAA TAATCAAGAA GAGCAAAGCA 42 00 

TGGATTCAAA CTTAGGTGAA GCAGCATCTG GGTGTGAGAG TGAAACAAGC GTCTCTGAAG 42 60 

ACTGCTCAGG GCTATCCTCT CAGAGTGACA TTTTAACCAC TCAGCAGAGG GATACCATGC 43 20 

AACATAACCT GATAAAGCTC CAGCAGGAAA TGGCTGAACT AGAAGCTGTG TTAGAACAGC 43 80 

% 3ATGGGAGCC A GCCTTCTAAC AGCTACCCTT CCATCATAAG TGACTCCTCT GCCCTTGAGG 4440 

'SlCCTGCGAAA TCCAGAACAA AGCACATCAG AAAAAGCAGT ATTAACTTCA CAGAAAAGTA 4500 

>bTGAATACCC TATAAGCCAG AATCCAGAAG GCCTTTCTGC TGACAAGTTT GAGGTGTCTG 4560 

| ,;C AGATAGTTC TACCAGTAAA AATAAAGAAC CAGGAGTGGA AAGGTCATCC CCTTCTAAAT 4 62 0 

: t^CCCATCATT AGATGATAGG TGGTACATGC ACAGTTGCTC TGGGAGTCTT CAGAATAGAA 4 680 

[JlCTACCCATC T C AAGAGG AG CTCATTAAGG TTGTTGATGT GGAGGAGCAA CAGCTGGAAG 47 40 

AGTCTGGGCC AC AC GATTTG ACGGAAACAT CTTACTTGCC AAGGCAAGAT CTAGAGGGAA 48 00 

CCCCTTACCT GGAATCTGGA ATCAGCCTCT TCTCTGATGA CC CTGAATCT GATCCTTCTG 4860 

AAGACAGAGC CCCAGAGTCA GCTCGTGTTG GCAACATACC ATCTTCAACC TCTGCATTGA 4 920 

AAGTTCCCCA ATTGAAAGTT GCAGAATCTG CCCAGGGTCC AGCTGCTGCT CATACTACTG 49 8 0 

ATACTGCTGG GTATAATGCA ATGGAAGAAA GTGTGAGCAG GGAGAAGCCA GAATTGACAG 5 040 

CTTCAACAGA AAGGGTCAAC AAAAGAATGT CCATGGTGGT GTCTGGCCTG ACCCCAGAAG 510 0 
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AATTTATGCT CGTGTACAAG TTTGCCAGAA AACACCACAT CACTTTAACT AATCTAATTA 5160 

CTGAAGAGAC TACTCATGTT GTTATGAAAA CAGATGCTGA GTTTGTGTGT GAACGGACAC 522 0 

TGAAATATTT TCTAGGAATT GCGGGAGGAA AATGGGTAGT TAGCTATTTC TGGGTGACCC 52 SO 

AGTCTATTAA AGAAAGAAAA ATGCTGAATG AGCATGATTT TGAAGTCAGA GGAGATGTGG 53 4 0 

TCAATGGAAG AAACCACCAA GGTCCAAAGC GAGCAAGAGA ATCCCAGGAC AGAAAGATCT 54 0 0 

TCAGGGGGCT AGAAATCTGT TGCTATGGGC CCTTCACCAA CATGCCCACA GATCAACTGG 54 60 

AATGGATGGT ACAGCTGTGT GGTGCTTCTG TGGTGAAGGA GCTTTCATCA TTCACCCTTG 5 52 0 

GCACAGGTGT CCACCCAATT GTGGTTGTGC AGCCAGATGC CTGGACAGAG GACAATGGCT 5580 

s - TCCATGCAAT TGGGCAGATG TGTGAGGCAC CTGTGGTGAC CCGAGAGTGG GTGTTGGACA 5640 

;§TGTAGCACT-CTACCAGTGC CAGGAGCTGG ACACCTACCT GATACCCCAG ATCCCCCACA 57 0 0 

^iSCCACTACTG A 5711 

H2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 
y (A) LENGTH: 1863 amino acids 

- J (B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCA1 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME /SEGMENT: 17 

(B) MAP POSITION: 17q21 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 
1 5 10 15 

Ala Met Gin Lys lie Leu Glu Cys Pro lie Cys Leu Glu Leu lie Lys 

20 25 30 

Glu Pro Val Ser Thr Lys Cys Asp His lie Phe Cys Lys Phe Cys Met 
35 40 45 

Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 
50 55 60 

Lys Asn Asp lie Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 
65 70 s 75 80 

Gin Leu Val Glu Glu Leu Leu Lys lie lie Cys Ala Phe Gin Leu Asp 
85 90 95 

Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 

100 105 110 

Asn Ser Pro Glu His Leu Lys Asp Glu Val Ser lie lie Gin Ser Met 
115 120 125 

Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 
130 135 140 

Pro Ser Leu Gin Glu Thr Ser Leu Ser Val Gin Leu Ser Asn Leu Gly 
145 150 155 160 

Thr Val Arg Thr Leu Arg Thr Lys Gin Arg lie Gin Pro Gin Lys Thr 
165 170 175 

Ser Val Tyr lie Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 

180 185 190 
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Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin lie Thr 
-195 200 ~ 205 



Pro Gin Gly Thr Arg 
210 

Ala Cys Glu Phe Ser 
225 

Pro Ser Asn Asn Asp 
245 

His Pro Glu Lys Tyr 
260 

Pro Cys Gly Thr Asn 
275 

Ser Leu Leu Leu Thr 
290 

Cys Asn Lys Ser Lys 
305 

Trp Ala Gly Ser Lys 
325 

Glu Lys Lys Val Asp 
340 

Trp Asn Lys Gin Lys 
355 

Asp Val Pro Trp lie 
370 

Trp Phe Ser Arg Ser 
385 

Gly Glu Ser Glu Ser 



Asp Glu lie Ser Leu Asp 
215 

Glu Thr Asp Val Thr Asn 

230 235 

Leu Asn Thr Thr Glu Lys 
250 

Gin Gly Ser Ser Val Ser 
265 

Thr His Ala Ser Ser Leu 
280 

Lys Asp Arg Met Asn Val 
295 

Gin Pro Gly Leu Ala Arg 
310 315 

Glu Thr Cys Asn Asp Arg 
330 

Leu Asn Ala Asp Pro Leu 
345 

Leu Pro Cys Ser Glu Asn 
360 

Thr Leu Asn Ser Ser lie 
375 

Asp Glu Leu Leu Gly Ser 
390 395 

Asn Ala Lys Val Ala Asp 



Ser Ala Lys Lys Ala 
220 

Thr Glu His His Gin 
240 

Arg Ala Ala Glu Arg 
255 

Asn Leu His Val Glu 
270 

Gin His Glu Asn Ser 
285 

Glu Lys Ala Glu Phe 
300 

Ser Gin His Asn Arg 
320 

Arg Thr Pro Ser Thr 
335 

Cys Glu Arg Lys Glu 
350 

Pro Arg Asp Thr Glu 
365 

Gin Lys Val Asn Glu 
380 

Asp Asp Ser His Asp 

400 

Val Leu Asp Val Leu 
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405 



410 



415 



Asn Glu Val Asp 
420 

Ala Ser Asp Pro 
435 

Ser Lys Ser Val 
450 

Tyr Arg Lys Lys 
465 

Leu lie lie Gly 



Pro Leu Thr Asn 
5Q0 

His Pro Glu Asp 
515 

Pro Glu Met lie 
530 

Val Met Asn lie 
545 

Ser He Gin Asn 

Glu Ser Ala Phe 
580 

Asn Met Glu Leu 
595 

Asn Arg Leu Arg 
610 



Glu Tyr Ser Gly 



His Glu Ala Leu 
440 

Glu Ser Asn He 
455 

Ala Ser Leu Pro 
470 

Ala Phe Val Thr 
485 

Lys Leu Lys Arg 



Phe He Lys Lys 
520 

Asn Gin Gly Thr 
535 

Thr Asn Ser Gly 
550 

Glu Lys Asn Pro 
565 

Lys Thr Lys Ala 

Glu Leu Asn He 
600 

Arg Lys Ser Ser 
615 



Ser Ser Glu Lys 
425 

lie Cys Lys Ser 



Glu Asp Lys He 
460 

Asn Leu Ser His 
475 

Glu Pro Gin He 
490 

Lys Arg Arg Pro 
505 

Ala Asp Leu Ala 



Asn Gin Thr Glu 
540 

His Glu Asn Lys 
555 

Asn Pro He Glu 
570 

Glu Pro He Ser 
585 

His Asn Ser Lys 

Thr Arg His He 
620 



He Asp Leu Leu 
430 

Glu Arg Val His 
445 

Phe Gly Lys Thr 

Val Thr Glu Asn 
480 

He Gin Glu Arg 
495 

Thr Ser Gly Leu 
510 

Val Gin Lys Thr 
525 

Gin Asn Gly Gin 



Thr Lys Gly Asp 
560 

Ser Leu Glu Lys 
575 

Ser Ser lie Ser 
590 

Ala Pro Lys Lys 
605 

His Ala Leu Glu 
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Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 . 630 - 635 640 



lie Asp Ser Cys 



Gin Met Pro Val 
660 

Glu Pro Ala Thr 
675 

Ser Lys Arg His 
690 

Ala Pro Gly Ser 
705 

Phe Val Asn Pro 



Thr Val Lys Val 
740 

Ser Gly Glu Arg 
755 

lie Ser Leu Val 
770 

Leu Leu Glu Val 
785 

Cys Val Ser Gin 



Gly Cys Ser Lys 
820 

Leu Gly His Glu 



Ser Ser Ser Glu 
645 

Arg His Ser Arg 



Gly Ala Lys Lys 
680 

Asp Ser Asp Thr 
695 

Phe Thr Lys Cys 
710 

Ser Leu Pro Arg 
725 

Ser Asn Asn Ala 



Val Leu Gin Thr 
760 

Pro Gly Thr Asp 
775 

Ser Thr Leu Gly 
790 

Cys Ala Ala Phe 
805 

Asp Asn Arg Asn 



Val Asn His Ser 



Glu lie Lys Lys 
650 

Asn Leu Gin Leu 
665 

Ser Asn Lys Pro 



Phe Pro Glu Leu 
700 

Ser Asn Thr Ser 
715 

Glu Glu Lys Glu 
730 

Glu Asp Pro Lys 
745 

Glu Arg Ser Val 

Tyr Gly Thr Gin 
780 

Lys Ala Lys Thr 
795 

Glu Asn Pro Lys 
810 

Asp Thr Glu Gly 
825 

Arg Glu Thr Ser 
53 



Lys Lys Tyr Asn 
655 

Met Glu Gly Lys 
670 

Asn Glu Gin Thr 
685 

Lys Leu Thr Asn 

Glu Leu Lys Glu 
720 

Glu Lys Leu Glu 
735 

Asp Leu Met Leu 
750 

Glu Ser Ser Ser 
765 

Glu Ser lie Ser 



Glu Pro Asn Lys 

800 

Gly Leu He His 
815 

Phe Lys Tyr Pro 
830 

He Glu Met Glu 



835 840 845 

Glu Ser Glu Leu Asp Ala Gin Tyr Leu 'Gin Asn Thr Phe Lys Val Ser 
850 855 860 

Lys Arg Gin Ser Phe Ala Leu Phe Ser Asn Pro Gly Asn Ala Glu Glu 
865 870 875 880 

Glu Cys Ala Thr Phe Ser Ala His Ser Gly Ser Leu Lys Lys Gin Ser 

885 890 895 

Pro Lys Val Thr Phe Glu Cys Glu Gin Lys Glu Glu Asn Gin Gly Lys 

900 905 910 

Asn Glu Ser Asn lie Lys Pro Val Gin Thr Val Asn lie Thr Ala Gly 
915 920 925 

Phe Pro Val Val Gly Gin Lys Asp Lys Pro Val Asp Asn Ala Lys Cys 
930 935 940 

Ser lie Lys Gly Gly Ser Arg Phe Cys Leu Ser Ser Gin Phe Arg Gly 
945 950 955 960 

Asn Glu Thr Gly Leu lie Thr Pro Asn Lys His Gly Leu Leu Gin Asn 
965 970 975 

Pro Tyr Arg lie Pro Pro Leu Phe Pro lie Lys Ser Phe Val Lys Thr 
980 985 990 

Lys Cys Lys Lys Asn Leu Leu Glu Glu Asn Phe Glu Glu His Ser Met 
995 1000 1005 

Ser Pro Glu Arg Glu Met Gly Asn Glu Asn lie Pro Ser Thr Val Ser 
1010 1015 1020 

Thr lie Ser Arg Asn Asn lie Arg Glu Asn Val Phe Lys Gly Ala Ser 
1025 1030 1035 1040 

Ser Ser Asn lie Asn Glu Val Gly Ser Ser Thr Asn Glu Val Gly Ser 
1045 1050 1Q55 
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Ser lie Asn Glu lie Gly Ser Ser Asp Glu Asn lie Gin Ala Glu Leu 
1060 1065 1070 



Gly Arg Asn Arg Gly Pro Lys Leu Asn Ala Met Leu Arg Leu Gly Val 
1075 1080 1085 

Leu Gin Pro Glu Val Tyr Lys Gin Ser Leu Pro Gly Ser Asn Cys Lys 
1090 1095 1100 

His Pro Glu lie Lys Lys Gin Glu Tyr Glu Glu Val Val Gin Thr Val 
1105 1110 1115 1120 

Asn Thr Asp Phe Ser Pro Tyr Leu lie Ser Asp Asn Leu Glu Gin Pro 
1125 1130 1135 

Met Gly Ser Ser His Ala Ser Gin Val Cys Ser Glu Thr Pro Asp Asp 
1140 1145 1150 

Leu Leu Asp Asp Gly Glu lie Lys Glu Asp Thr Ser Phe Ala Glu Asn 
1155 1160 1165 

Asp lie Lys Glu Ser Ser Ala Val Phe Ser Lys Ser Val Gin Arg Gly 
1170 1175 1180 

Glu Leu Ser Arg Ser Pro Ser Pro Phe Thr His Thr His Leu Ala Gin 
1185 1190 1195 1200 

Gly Tyr Arg Arg Gly Ala Lys Lys Leu Glu Ser Ser Glu Glu Asn Leu 
1205 1210 1215 

Ser Ser Glu Asp Glu Glu Leu Pro Cys Phe Gin His Leu Leu Phe Gly 
1220 1225 1230 

Lys Val Asn Asn lie Pro Ser Gin Ser Thr Arg His Ser Thr Val Ala 
1235 1240 1245 

Thr Glu Cys Leu Ser Lys Asn Thr Glu Glu Asn Leu Leu Ser Leu Lys 
1250 1255 1260 

Asn Ser Leu Asn Asp Cys Ser Asn Gin Val lie Leu Ala Lys Ala Ser 
1265 1270 1275 1280 
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Gin Glu .His His Leu Ser Glu Glu Thr ^ Lys Cys Ser Ala Ser Leu Phe 
1285 1290 1295 



Ser Ser Gin Cys Ser Glu Leu Glu Asp Leu Thr Ala Asn Thr Asn Thr 
1300 1305 1310 

Gin Asp Pro Phe Leu lie Gly Ser Ser Lys Gin Met Arg His Gin Ser 
1315 1320 1325 

Glu Ser Gin Gly Val Gly Leu Ser Asp Lys Glu Leu Val Ser Asp Asp 
1330 1335 1340 

Glu Glu Arg Gly Thr Gly Leu Glu Glu Asn Asn Gin Glu Glu Gin Ser 
1345 1350 1355 1360 

Met Asp Ser Asn Leu Gly Glu Ala Ala Ser Gly Cys Glu Ser Glu Thr 
1365 1370 1375 

Ser Val Ser Glu Asp Cys Ser Gly Leu Ser Ser Gin Ser Asp lie Leu 
1380 1385 1390 

Thr Thr Gin Gin Arg Asp Thr Met Gin His Asn Leu lie Lys Leu Gin 
1395 1400 1405 

Gin Glu Met Ala Glu Leu Glu Ala Val Leu Glu Gin His Gly Ser Gin 
1410 1415 1420 

Pro Ser Asn Ser Tyr Pro Ser lie lie Ser Asp Ser Ser Ala Leu Glu 
1425 1430 1435 1440 

Asp Leu Arg Asn Pro Glu Gin Ser Thr Ser Glu Lys Ala Val Leu Thr 
1445 1450 1455 

Ser Gin Lys Ser Ser Glu Tyr Pro lie Ser Gin Asn Pro Glu Gly Leu 
1460 1465 1470 

Ser Ala Asp Lys Phe Glu Val Ser Ala Asp Ser Ser Thr Ser Lys Asn 
1475 1480 1485 
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Lys Glu Pro Gly Val Glu Arg Ser Ser Pro Ser Lys Cys Pro Ser Leu 

1490. 1495 " 1500 

Asp Asp Arg Trp Tyr Met His Ser Cys Ser Gly Ser Leu Gin Asn Arg 
1505 1510 1515 1520 

Asn Tyr Pro Ser Gin Glu Glu Leu lie Lys Val Val Asp Val Glu Glu 
1525 1530 1535 

Gin Gin Leu Glu Glu Ser Gly Pro His Asp Leu Thr Glu Thr Ser Tyr 
1540 1545 1550 

Leu Pro Arg Gin Asp Leu Glu Gly Thr Pro Tyr Leu Glu Ser Gly lie 
1555 1560 1565 

Ser Leu Phe Ser Asp Asp Pro Glu Ser Asp Pro Ser Glu Asp Arg Ala 
1570 1575 1580 

Pro Glu Ser Ala Arg Val Gly Asn lie Pro Ser Ser Thr Ser Ala Leu 
1585 1590 1595 1500 

Lys Val Pro Gin Leu Lys Val Ala Glu Ser Ala Gin Gly Pro Ala Ala 
1605 1610 1615 

Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Glu Ser Val 
1620 1625 1630 

Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu Arg Val Asn Lys 
1635 1640 1645 

Arg Met Ser Met Val Val Ser Gly Leu Thr Pro Glu Glu Phe Met Leu 
1650 1655 1660 

Val Tyr Lys Phe Ala Arg Lys His His lie Thr Leu Thr Asn Leu lie 
1665 1670 1675 1630 

Thr Glu Glu Thr Thr His Val Val Met Lys Thr Asp Ala Glu Phe Val 
1685 1690 1695 

Cys Glu Arg Thr Leu Lys Tyr Phe Leu Gly lie Ala Gly Gly Lys Trp 
1700 1705 1710 
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Val Val - Ser Tyr Phe 
1715 

Leu Asn Glu His Asp 
1730 

Asn His Gin Gly Pro 
1745 

Phe Arg Gly Leu Glu 
1765 

Thr Asp Gin Leu Glu 
1780 

Lys Glu Leu Ser Ser 
1795 

Val Val Gin Pro Asp 
1810 

Gly Gin Met Cys Glu 
1825 

Ser Val Ala Leu Tyr 
1845 

Gin lie Pro His Ser 
1360 



Trp Val Thr Gin* Ser lie 
1720 

Phe Glu Val Arg Gly Asp 
1735 

Lys Arg Ala Arg Glu Ser 
1750 175! 

lie Cys Cys Tyr Gly Pro 
1770 

Trp Met Val Gin Leu Cys 
1785 

Phe Thr Leu Gly Thr Gly 
1800 

Ala Trp Thr Glu Asp Asn 
1815 

Ala Pro Val Val Thr Arg 
1830 183! 

Gin Cys Gin Glu Leu Asp 
1850 

His Tyr 



Lys Glu Arg Lys Met 
1725 

Val Val Asn Gly Arg 
1740 

Gin Asp Arg Lys lie 

1760 

Phe Thr Asn Met Pro 
1775 

Gly Ala Ser Val Val 
1790 

Val His Pro He Val 
1805 

Gly Phe His Ala He 
1820 

Glu Trp Val Leu Asp 
1840 

Thr Tyr Leu He Pro 
1855 



INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5711 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCAl 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME /SEGMENT: 17 

(B) MAP POSITION: 17q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3; 

AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 60 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 12 0 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 18 0 

TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 24 Q 

ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGT 3 00 

GTCCTTTATG TAAGAATGAT ATAACCAAAA GGAGCCTACA AG AAAGTAC G AGATTTAGTC 3 60 

AACTTGTTGA AGAGCTATTG AAAATCATTT GTGCTTTTCA GCTTGACACA GGTTTGGAGT 42 0 

ATGCAAACAG CTATAATTTT GCAAAAAAGG AAAATAACTC TCCTGAACAT CTAAAAGATG 480 

AAGTTTCTAT CATCCAAAGT ATGGGCTACA GAAACCGTGC C AAAAG ACT T CTACAGAGTG 54 0 

AACCCGAAAA TCCTTCCTTG CAGGAAACCA GTCTCAGTGT CCAACTCTCT AACCTTGGAA 600 

CTGTGAGAAC TCTGAGGACA AAGCAGCGGA TACAACCTCA AAAGACGTCT GTCTACATTG 660 

AATTGGGATC TGATTCTTCT GAAGATACCG TTAATAAGGC AACTTATTGC AGTGTGGGAG 72 0 

ATCAAGAATT GTTACAAATC ACCCCTCAAG GAACCAGGGA TGAAATCAGT TTGGATTCTG 78 0 

59 



CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA CGGATGTAAC AAATACTGAA CATCATCAAC 840 

CC AG TAATAA TGATTTGAAC ACCACTGAGA AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 900 

ATCAGGGTAG TTCTGTTTCA AACTTGCATG TGGAGCCATG TGGCACAAAT ACTCATGCCA 960 

GCTCATTACA GCATGAGAAC AGCAGTTTAT TACTCACTAA AGACAGAATG AATGTAGAAA 1020 

AGGCTGAATT CTGTAATAAA AGCAAACAGC CTGGCTTAGC AAGGAGCCAA CATAACAGAT 10 30 

GGGCTGGAAG TAAGGAAACA TGTAATGATA GGCGGACTCC CAGCACAGAA AAAAAGGTAG 1140 

ATCTGAATGC TGATCCCCTG TGTGAGAGAA AAGAATGGAA TAAGCAGAAA CTGCCATGCT 12 00 

; CAGAGAATCC TAGAGATACT GAAGATGTTC CTTGGATAAC ACTAAATAGC AGCATTCAGA 1260 

] AAGTTAATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG TTCTGATGAC TCACATGATG 132 0 

^ GGGAGTCTGA ATCAAATGCC AAAGTAGCTG ATGTATTGGA CGTTCTAAAT GAGGTAGATG 13 80 

AATATTCTGG TTCTTCAGAG AAAATAGACT TACTGGCCAG TGATCCTCAT GAGGCTTTAA 1440 

f TATGT AAAAG TGAAAGAGTT CACTCCAAAT CAGTAGAGAG TAATATTGAA GACAAAATAT 1500 

JTTGGGAAAAC CTATCGGAAG AAGGCAAGCC TCCCCAACTT AAGCCATGTA ACTGAAAATC 15 60 

TAATTATAGG AGCATTTGTT ACTGAGCCAC AGATAATACA AGAGCGTCCC CTCACAAATA 162 0 

AATTAAAGCG TAAAAGGAGA CCTACATCAG GCCTTCATCC TGAGGATTTT ATCAAGAAAG 1680 

CAGATTTGGC AGTT C AAAAG ACTCCTGAAA TGATAAATCA GGGAACTAAC CAAACGGAGC 1740 

AGAATGGTCA AGTGATGAAT ATTACTAATA GTGGTCATGA GAATAAAACA AAAGGTGATT 18 0 0 

CTATTCAGAA TGAGAAAAAT CCTAACCCAA TAGAATCACT CGAAAAAGAA TCTGCTTTCA 18 6 0 

AAACGAAAGC TGAAC C TATA AGCAGCAGTA TAAGCAATAT GGAACTCGAA TTAAATATCC 1920 

ACAATTCAAA AGCACCTAAA AAGAATAGGC TGAGGAGGAA GTCTTCTACC AGGCATATTC 19 80 
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ATGCGCTTGA ACTAGTAGTC AGTAGAAATC TAAGCCCACC TAATTGTACT GAATTGCAAA 2 040 

TTGATAGTTG TTCTAGCAGT GAAGAGATAA AGAAAAAAAA GTACAACCAA ATGCCAGTCA 2100 

GGCACAGCAG AAACCTACAA CTCATGGAAG GTAAAGAACC TGCAACTGGA GCCAAGAAGA 2160 

GTAACAAGCC AAATGAACAG ACAAGTAAAA GACATGACAG CGATACTTTC CCAGAGCTGA 222 0 

AGTTAACAAA TGCACCTGGT TCTTTTACTA AGTGTTCAAA TACCAGTGAA CTTAAAGAAT 22 80 

TTGTCAATCC TAGCCTTCCA AGAGAAGAAA AAGAAGAGAA ACTAGAAACA GTTAAAGTGT 2 340 

CTAATAATGC TGAAGACCCC AAAGATCTCA TGTTAAGTGG AGAAAGGGTT TTGCAAACTG 2400 

AAAGATCTGT AGAGAGTAGC AGTATTTCAT TGGTACCTGG TACTGATTAT GGC AC TCAGG 2 4 60 

AAAGTATCTC GTTACTGGAA GTTAGCACTC TAGGGAAGGC AAAAACAGAA CCAAATAAAT 2 52 0 

GTGTGAGTCA GTGTGCAGCA TTTGAAAACC CCAAGGGACT AATTCATGGT TGTTCCAAAG 2 5 80 

ATAATAGAAA TGACACAGAA GGCTTTAAGT ATCCATTGGG ACATGAAGTT AACCACAGTC 2 640 

GGGAAACAAG CATAGAAATG GAAGAAAGTG AACTTGATGC TCAGTATTTG CAGAATACAT 2700 

TCAAGGTTTC AAAGCGCCAG TCATTTGCTC TGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 27 60 

AATGTGCAAC ATTCTCTGCC CACTCTGGGT CCTTAAAGAA ACAAAGTCCA AAAGTCACTT 2820 

TTGAATGTGA ACAAAAGGAA GAAAATCAAG GAAAGAATGA GTCTAATATC AAGCCTGTAC 2 8 80 

AGACAGTTAA TATCACTGCA GGCTTTCCTG TGGTTGGTCA GAAAGATAAG CCAGTTGATA 294 0 

ATGCCAAATG TAGTAT C AAA GGAGGCTCTA GGTTTTGTCT ATCATCTCAG TTCAGAGGCA 3 0 00 

ACGAAACTGG ACTCATTACT CCAAATAAAC ATGGACTTTT ACAAAACCCA TATCGTATAC 3 0 6Q 

CACCACTTTT TCCCATCAAG TCATTTGTTA AAACTAAATG TAAGAAAAAT CTGCTAGAGG 312 0 

AAAACTTTGA GGAACATTCA ATGTCACCTG AAAGAGAAAT GGGAAATGAG AACATTCCAA 3180 
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GTACAGTGAG CACAATTAGC CGTAATAACA TTAGAGAAAA TGTTTTTAAA GAAGCCAGCT 3240 

CAAGCAATAT TAATGAAGTA GGTTCCAGTA CTAATGAAGT GGGCTCCAGT ATT AAT G AAA 3 3 00 

TAGGTTC CAG TGATGAAAAC ATTCAAGCAG AACTAGGTAG AAACAGAGGG CCAAAATTGA 3 3 60 

ATGCTATGCT TAGATTAGGG GTTTTGCAAC CTGAGGTCTA TAAACAAAGT CTTCCTGGAA 3420 

GTAATTGTAA GCATCCTGAA ATAAAAAAGC AAGAATATGA AGAAGTAGTT CAGACTGTTA 3480 

ATACAGATTT CTCTCCATAT CTGATTTCAG ATAACTTAGA ACAGCCTATG GGAAGTAGTC 3 540 

ATGCATCTCA GGTTTGTTCT GAGACACCTG ATGACCTGTT AGATGATGGT GAAATAAAGG 3 600 

AAGATACTAG TTTTGCTGAA AATGACATTA AGGAAAGTTC TGCTGTTTTT AGCAAAAGCG 3 660 

TCCAGAAAGG AGAGCTTAGC AGGAGTCCTA GCCCTTTCAC CCATACACAT TTGGCTCAGG 3720 

GTTACCGAAG AGGGGCCAAG AAATTAGAGT CCTCAGAAGA GAACTTATCT AGTGAGGATG 37 80 

AAGAGCTTCC CTGCTTCCAA CACTTGTTAT TTGGTAAAGT AAACAATATA CCTTCTCAGT 3 840 

CTACTAGGCA TAGCACCGTT GCTACCGAGT GTCTGTCTAA GAACACAGAG GAGAATTTAT 3 900 

TATCATTGAA GAATAGCTTA AATGACTGCA GTAACCAGGT AATATTGGCA AAGGCATCTC 3 9 60 

AGGAACATCA CCTTAGTGAG GAAACAAAAT GTTCTGCTAG CTTGTTTTCT TCACAGTGCA 4 02 0 

GTGAATTGGA AGACTTGACT GCAAATACAA AC AC C C AGG A TCCTTTCTTG ATTGGTTCTT 40 8 0 

CCAAACAAAT GAGGCATCAG TCTGAAAGCC AGGGAGTTGG TCTGAGTGAC AAGGAATTGG 4140 

TTTCAGATGA TGAAGAAAGA GGAACGGGCT TGGAAGAAAA TAATCAAGAA GAGCAAAGCA 42 0 0 

TGGATTCAAA CTTAGGTGAA GCAGCATCTG GGTGTGAGAG TGAAACAAGC GTCTCTGAAG 42 60 

ACTGCTCAGG GCTATCCTCT CAGAGTGACA TTTTAACCAC TCAGCAGAGG GATACCATGC 4320 

AACATAACCT GATAAAGCTC CAGCAGGAAA TGGCTGAACT AGAAGCTGTG TTAGAACAGC 438 0 
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ATGGGAGCCA GCCTTCTAAC AGCTACCCTT CCATCATAAG TGACTCTTCT GCCCTTGAGG 444 0 

ACCTGCGAAA TCCAGAACAA AGCACATCAG AAAAAGCAGT ATTAACTTCA CAGAAAAGTA 4500 

GTGAATACCC TATAAGCCAG AATCCAGAAG GCCTTTCTGC TGACAAGTTT GAGGTGTCTG 45 60 

CAGATAGTTC TACCAGTAAA AATAAAGAAC CAGGAGTGGA AAGGTCATCC CCTTCTAAAT 4 62 0 

GCCCATCATT AGATGATAGG TGGTACATGC ACAGTTGCTC TGGGAGTCTT CAGAATAGAA 4 680 

ACTACCCATC TCAAGAGGAG CTCATTAAGG TTGTTGATGT GGAGGAGCAA CAGCTGGAAG 4740 

AGTCTGGGCC ACACGATTTG ACGGAAACAT CTTACTTGCC AAGGCAAGAT CTAGAGGGAA 48QQ 

CCCCTTACCT GGAATCTGGA ATCAGCCTCT TCTCTGATGA CCCTGAATCT GATCCTTCTG 4860 

AAGACAGAGC CCCAGAGTCA GCTCGTGTTG GCAACATACC ATCTTCAACC TCTGCATTGA 4920 

AAGTTCCCCA ATTGAAAGTT GCAGAATCTG CCCAGAGTCC AGCTGCTGCT CATACTACTG 49 80 

ATACTGCTGG GTATAATGCA ATGGAAGAAA GTGTGAGCAG GGAGAAGCCA GAATTGACAG 50 40 

CTTCAACAGA AAGGGTCAAC AAAAGAATGT CCATGGTGGT GTCTGGCCTG ACCCCAGAAG 5100 

AATTTATGCT CGTGTACAAG TTTGCCAGAA AACACCACAT CACTTTAACT AATCTAATTA 5160 

CTGAAGAGAC TACTCATGTT GTTATGAAAA CAGATGCTGA GTTTGTGTGT GAACGGACAC 522 0 

TGAAATATTT TCTAGGAATT GCGGGAGGAA AATGGGTAGT TAGCTATTTC TGGGTGACCC 52 8 0 

AGTCTATTAA AGAAAGAAAA ATGCTGAATG AGCATGATTT TGAAGTCAGA GGAGATGTGG 53 40 

TCAATGGAAG AAACCACCAA GGTCCAAAGC GAGCAAGAGA ATCCCAGGAC AGAAAGATCT 54 0 0 

TCAGGGGGCT AGAAATCTGT TGCTATGGGC CCTTCACCAA CATGCCCACA GATCAACTGG 54 60 

AATGGATGGT ACAGCTGTGT GGTGCTTCTG TGGTGAAGGA GCTTTCATCA TTCACCCTTG 552 0 

GCACAGGTGT CCACCCAATT GTGGTTGTGC AGCCAGATGC CTGGACAGAG GACAATGGCT 558 0 
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TCCATGCAAT TGGGCAGATG TGTGAGGCAC CTGTGGTGAC CCGAGAGTGG GTGTTGGACA 5640 



GTGTAGCACT CTACCAGTGC CAGGAGCTGG ACACCTACCT GATACCCCAG ATCCCCCACA 57 0 0 

GCCACTACTG A 57 H 

(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1863 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCA1 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME/ SEGMENT: 17 

(B) MAP POSITION: 17q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 

15 10 15 

Ala Met Gin Lys lie Leu Glu Cys Pro lie Cys Leu Glu Leu lie Lys 

20 25 30 

Glu Pro Val Ser Thr Lys Cys Asp His lie Phe Cys Lys Phe Cys Met 

35 40 45 

Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 

50 55 60 
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Lys Asn ^ Asp lie Thr Lys Arg Ser Leu- Gin Glu Ser Thr Arg Phe Ser 
65 70 75 80 



Gin Leu Val Glu Glu Leu Leu Lys lie lie Cys Ala Phe Gin Leu Asp 
85 90 95 

Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 
100 105 110 

Asn Ser Pro Glu His Leu Lys Asp Glu Val Ser lie lie Gin Ser Met 
115 120 125 

Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 
130 135 140 

Pro Ser Leu Gin Glu Thr Ser Leu Ser Val Gin Leu Ser Asn Leu Gly 
145 150 155 160 

Thr Val Arg Thr Leu Arg Thr Lys Gin Arg lie Gin Pro Gin Lys Thr 

165 170 175 

Ser Val Tyr lie Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 
180 185 190 

Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin lie Thr 
195 200 205 

Pro Gin Gly Thr Arg Asp Glu lie Ser Leu Asp Ser Ala Lys Lys Ala 
210 215 220 

Ala Cys Glu Phe Ser Glu Thr Asp Val Thr Asn Thr Glu His His Gin 
225 230 235 "240 

Pro Ser Asn Asn Asp Leu Asn Thr Thr Glu Lys Arg Ala Ala Glu Arg 
245 250 255 

His Pro Glu Lys Tyr Gin Gly Ser Ser Val Ser Asn Leu His Val Glu 
260 265 270 
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Cys Gly Thr Asn Thr His Ala Ser Ser Leu Gin His Glu Asn Ser 

«275 280 - 285 

Leu Leu Leu Thr Lys Asp Arg Met Asn Val Glu Lys Ala Glu Phe 
2 ^0 295 300 



Cys Asn Lys Ser Lys Gin Pro Gly 
305 310 

Trp Ala Gly Ser Lys Glu Thr Cys 

325 

Glu Lys Lys Val Asp Leu Asn Ala 
340 



Leu Ala Arg Ser Gin His Asn Arg 
315 320 

Asn Asp Arg Arg Thr Pro Ser Thr 

330 335 

Asp Pro Leu Cys Glu Arg Lys Glu 
345 350 



Trp Asn Lys Gin Lys Leu Pro Cys Ser Glu Asn Pro Arg Asp Thr Glu 

355 360 365 

Asp Val Pro Trp He Thr Leu Asn Ser Ser He Gin Lys Val Asn Glu 

370 375 380 



Trp Phe Ser Arg 
385 

Gly Glu Ser Glu 



Asn Glu Val Asp 
420 

Ala Ser Asp Pro 
435 

Ser Lys Ser Val 
450 

Tyr Arg Lys Lys 
465 

Leu lie He Gly 



Ser Asp Glu Leu 
390 

Ser Asn Ala Lys 
405 

Glu Tyr Ser Gly 



His Glu Ala Leu 
440 

Glu Ser Asn He 
455 

Ala Ser Leu Pro 
470 

Ala Phe Val Thr 



Leu Gly Ser Asp 
395 

Val Ala Asp Val 
410 

Ser Ser Glu Lys 
425 

He Cys Lys Ser 

Glu Asp Lys He 
460 

Asn Leu Ser His 
475 

Glu Pro Gin He 



Asp Ser His Asp 

400 

Leu Asp Val Leu 
415 

lie Asp Leu Leu 
430 

Glu Arg Val His 
445 

Phe Gly Lys Thr 



Val Thr Glu Asn 
480 

He Gin Glu Arg 
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485 



490 



495 



Pro Leu Thr Asn 
500 

His Pro Glu Asp 
515 

Pro Glu Met He 
530 

Val Met Asn He 
545 

Ser He Gin Asn 



Glu Ser Ala Phe 
580 

Asn Met Glu Leu 
595 

Asn Arg Leu Arg 
610 

Leu Val Val Ser 
625 

He Asp Ser Cys 

Gin Met Pro Val 
660 

Glu Pro Ala Thr 
675 

Ser Lys Arg His 
690 



Lys Leu Lys Arg 



Phe He Lys Lys 
520 

Asn Gin Gly Thr 
535 

Thr Asn Ser Gly 
550 

Glu Lys Asn Pro 
565 

Lys Thr Lys Ala 

Glu Leu Asn He 

600 

Arg Lys Ser Ser 
615 

Arg Asn Leu Ser 
630 

Ser Ser Ser Glu 
645 

Arg His Ser Arg 



Gly Ala Lys Lys 
680 

Asp Ser Asp Thr 
695 



Lys Arg Arg Pro 
505 

Ala Asp Leu Ala 



Asn Gin Thr Glu 
540 

His Glu Asn Lys 
555 

Asn Pro lie Glu 
570 

Glu Pro He Ser 
585 

His Asn Ser Lys 



Thr Arg His He 
620 

Pro Pro Asn Cys 
635 

Glu He Lys Lys 
650 

Asn Leu Gin Leu 
665 

Ser Asn Lys Pro 

Phe Pro Glu Leu 
700 



Thr Ser Gly Leu 
510 

Val Gin Lys Thr 
525 

Gin Asn Gly Gin 

Thr Lys Gly Asp 
560 

Ser Leu Glu Lys 
575 

Ser Ser He- Ser 
590 

Ala Pro Lys Lys 
605 

His Ala Leu Glu 



Thr Glu Leu Gin 
640 

Lys Lys Tyr Asn 
655 

Met Glu Gly Lys 
670 

Asn Glu Gin Thr 
685 

Lys Leu Thr Asn 
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Ala Pro Gly Ser 
705 

Phe Val Asn Pro 



Thr Val Lys Val 
740 

Ser Gly Glu Arg 
755 

lie Ser Leu Val 
770 

Leu Leu Glu Val 
785 

Cys Val Ser Gin 



Gly Cys Ser Lys 
820 

Leu Gly His Glu 
835 

Glu Ser Glu Leu 
850 

Lys Arg Gin Ser 
865 

Glu Cys Ala Thr 

Pro Lys Val Thr 
900 

Asn Glu Ser Asn 
915 



Phe Thr Lys Cys 
710 

Ser Leu Pro Arg 
725 

Ser Asn Asn Ala 



Val Leu Gin Thr 
760 

Pro Gly Thr Asp 
775 

Ser Thr Leu Gly 
790 

Cys Ala Ala Phe 
805 

Asp Asn Arg Asn 



Val Asn His Ser 
840 

Asp Ala Gin Tyr 
855 

Phe Ala Leu Phe 
870 

Phe Ser Ala His 
885 

Phe Glu Cys Glu 



lie Lys Pro Val 
920 



Ser Asn Thr Ser 
715 

Glu Glu Lys Glu 
730 

Glu Asp Pro Lys 
745 

Glu Arg Ser Val 



Tyr Gly Thr Gin 
780 

Lys Ala Lys Thr 
795 

Glu Asn Pro Lys 
810 

Asp Thr Glu Gly 
825 

Arg Glu Thr Ser 

Leu Gin Asn Thr 
860 

Ser Asn Pro Gly 
87 5 

Ser Gly Ser Leu 
890 

Gin Lys Glu Glu 
905 

Gin Thr Val Asn 



Glu Leu Lys Glu 
720 

Glu Lys Leu Glu 
735 

Asp Leu Met Leu 
750 

Glu Ser Ser Ser 
765 

Glu Ser He Ser 



Glu Pro Asn Lys 
800 

Gly Leu He His 
815 

Phe Lys Tyr Pro 
830 

He Glu Met Glu 
845 

Phe Lys Val Ser 



Asn Ala Glu Glu 

880 

Lys Lys Gin Ser 
895 

Asn Gin Gly Lys 
910 

He Thr Ala Gly 
925 
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Phe Pro ^ Val Val Gly Gin Lys Asp Lys- Pro Val Asp Asn Ala Lys Cys 
930 935 940 



Ser lie Lys Gly Gly Ser Arg Phe Cys Leu Ser Ser Gin Phe Arg Gly 
945 950 955 960 

Asn Glu Thr Gly Leu lie Thr Pro Asn Lys His Gly Leu Leu Gin Asn 

965 970 975 

Pro Tyr Arg lie Pro Pro Leu Phe Pro lie Lys Ser Phe Val Lys Thr 
980 985 990 

Lys Cys Lys Lys Asn Leu Leu Glu Glu Asn Phe Glu Glu His Ser Met 
995 1000 1005 

Ser Pro Glu Arg Glu Met Gly Asn Glu Asn lie Pro Ser Thr Val Ser 
1010 1015 1020 

Thr lie Ser Arg Asn Asn lie Arg Glu Asn Val Phe Lys Glu Ala Ser 
1025 1030 1035 1040 

Ser Ser Asn lie Asn Glu Val Gly Ser Ser Thr Asn Glu Val Gly Ser 

1045 1050 1055 

Ser lie Asn Glu lie Gly Ser Ser Asp Glu Asn lie Gin Ala Glu Leu 
1060 1065 1070 

Gly Arg Asn Arg Gly Pro Lys Leu Asn Ala Met Leu Arg Leu Gly Val 
1075 1080 1085 

Leu Gin Pro Glu Val Tyr Lys Gin Ser Leu Pro Gly Ser Asn Cys Lys 
1090 1095 1100 

His Pro Glu lie Lys Lys Gin Glu Tyr Glu Glu Val Val Gin Thr Val 
1105 1110 1115 1120 

Asn Thr Asp Phe Ser Pro Tyr Leu lie Ser Asp Asn Leu Glu Gin Pro 

1125 1130 1135 
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Met Gly Ser Ser His Ala Ser Gin Val Cys Ser Glu Thr Pro Asp Asp 
1140 1145 1150 



Leu Leu Asp Asp Gly Glu lie Lys Glu Asp Thr Ser Phe Ala Glu Asn 
1155 1160 1165 

Asp lie Lys Glu Ser Ser Ala Val Phe Ser Lys Ser Val Gin Lys Gly 
1170 1175 1180 

Glu Leu Ser Arg Ser Pro Ser Pro Phe Thr His Thr His Leu Ala Gin 
1185 1190 1195 1200 

Gly Tyr Arg Arg Gly Ala Lys Lys Leu Glu Ser Ser Glu Glu Asn Leu 

1205 1210 1215 

Ser Ser Glu Asp Glu Glu Leu Pro Cys Phe Gin His Leu Leu Phe Gly 
1220 1225 1230 

Lys Val Asn Asn lie Pro Ser Gin Ser Thr Arg His Ser Thr Val Ala 
1235 1240 1245 

Thr Glu Cys Leu Ser Lys Asn Thr Glu Glu Asn Leu Leu Ser Leu Lys 
1250 1255 1260 

Asn Ser Leu Asn Asp Cys Ser Asn Gin Val lie Leu Ala Lys Ala Ser 
1265 1270 1275 1280 

Gin Glu His His Leu Ser Glu Glu Thr Lys Cys Ser Ala Ser Leu Phe 

1285 1290 1295 

Ser Ser Gin Cys Ser Glu Leu Glu Asp Leu Thr Ala Asn Thr Asn Thr 
1300 1305 1310 

Gin Asp Pro Phe Leu lie Gly Ser Ser Lys Gin Met Arg His Gin Ser 
1315 1320 1325 

Glu Ser Gin Gly Val Gly Leu Ser Asp Lys Glu Leu Val Ser Asp Asp 
1330 1335 1340 

Glu Glu Arg Gly Thr Gly Leu Glu Glu Asn Asn Gin Glu Glu Gin Ser 
1345 1350 1355 1360 
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Met AspSer Asn Leu Gly Glu Ala Ala- Ser Gly Cys Glu Ser Glu Thr 

1365 -1370 1375 

Ser Val Ser Glu Asp Cys Ser Gly Leu Ser Ser Gin Ser Asp lie Leu 
1380 1385 1390 

Thr Thr Gin Gin Arg Asp Thr Met Gin His Asn Leu lie Lys Leu Gin 
1395 1400 1405 

Gin Glu Met Ala Glu Leu Glu Ala Val Leu Glu Gin His Gly Ser Gin 
1410 1415 1420 

Pro Ser Asn Ser Tyr Pro Ser lie He Ser Asp Ser Ser Ala Leu Glu 
1425 1430 1435 1440 

Asp Leu Arg Asn Pro Glu Gin Ser Thr Ser Glu Lys Ala Val Leu Thr 

1445 1450 1455 

Ser Gin Lys Ser Ser Glu Tyr Pro He Ser Gin Asn Pro Glu Gly Leu 
1460 1465 1470 

Ser Ala Asp Lys Phe Glu Val Ser Ala Asp Ser Ser Thr Ser Lys Asn 
1475 1480 1485 

Lys Glu Pro Gly Val Glu Arg Ser Ser Pro Ser Lys Cys Pro Ser Leu 

1490 1495 1500 

Asp Asp Arg Trp Tyr Met His Ser Cys Ser Gly Ser Leu Gin Asn Arg 

1505 1510 1515 1520 

Asn Tyr Pro Ser Gin Glu Glu Leu He Lys Val Val Asp Val Glu Glu 

1525 1530 1535 

Gin Gin Leu Glu Glu Ser Gly Pro His Asp Leu Thr Glu Thr Ser Tyr 
1540 1545 1550 

Leu Pro Arg Gin Asp Leu Glu Gly Thr Pro Tyr Leu Glu Ser Gly He 
1555 1560 1565 

Ser Leu Phe Ser Asp Asp Pro Glu Ser Asp Pro Ser Glu Asp Arg Ala 
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1570 



1575 



1580 



Pro Glu Ser Ala Arg Val Gly Asn He Pro Ser Ser Thr Ser Ala Leu 
1535 1590 1595 1600 

Lys Val Pro Gin Leu Lys Val Ala Glu Ser Ala Gin Ser Pro Ala Ala 

1605 1610 1615 

Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Glu Ser Val 
1^20 1625 1630 

Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu Arg Val Asn Lys 
1635 1640 1645 

Arg Met Ser Met Val Val Ser Gly Leu Thr Pro Glu Glu Phe Met Leu 
1650 1655 1660 

Val Tyr Lys Phe Ala Arg Lys His His He Thr Leu Thr Asn Leu He 
1665 1670 1675 1680 

Thr Glu Glu Thr Thr His Val Val Met Lys Thr Asp Ala Glu Phe Val 

1685 1690 1695 

Cys Glu Arg Thr Leu Lys Tyr Phe Leu Gly He Ala Gly Gly Lys Trp 
1700 1705 1710 

Val Val Ser Tyr Phe Trp Val Thr Gin Ser He Lys Glu Arg Lys Met 
1715 1720 1725 

Leu Asn Glu His Asp Phe Glu Val Arg Gly Asp Val Val Asn Gly Arg 
1730 1735 1740 

Asn His Gin Gly Pro Lys Arg Ala Arg Glu Ser Gin Asp Arg Lys lie 
1745 1750 1755 1760 

Phe Arg Gly Leu Glu He Cys Cys Tyr Gly Pro Phe Thr Asn Met Pro 

1765 1770 1775 

Thr Asp Gin Leu Glu Trp Met Val Gin Leu Cys Gly Ala Ser Val Val 
1780 1785 1790 



72 



Lys Glu ^ Leu Ser Ser Phe Thr Leu Gly- Thr Gly Val His Pro He Val 

1795 1800 1805 

Val Val Gin Pro Asp Ala Trp Thr Glu Asp Asn Gly Phe His Ala He 

1810 1815 1820 

Gly Gin Met Cys Glu Ala Pro Val Val Thr Arg Glu Trp Val Leu Asp 
1825 1830 1835 i 8 40 

Ser Val Ala Leu Tyr Gin Cys Gin Glu Leu Asp Thr Tyr Leu He Pro 

1845 1850 1855 

Gin He Pro His Ser His Tyr 
1860 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5711 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCA1 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME/ SEGMENT: 17 

(B) MAP POSITION: 17q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 
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AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 6 0 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 12 0 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 18 0 

TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 240 

ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGT 3 00 

GTCCTTTATG TAAGAATGAT ATAACCAAAA GGAGCCTACA AGAAAGTACG AGATTTAGTC 3 60 

AACTTGTTGA AGAGCTATTG AAAATCATTT GTGCTTTTCA GCTTGACACA GGTTTGGAGT 42 0 

ATGCAAACAG C T AT AATTTT GCAAAAAAGG AAAATAACTC TCCTGAACAT CTAAAAGATG 48 0 

AAGTTTCTAT CATCCAAAGT ATGGGCTACA GAAACCGTGC CAAAAGACTT CTACAGAGTG 54 0 

AACCCGAAAA TCCTTCCTTG CAGGAAACCA GTCTCAGTGT CCAACTCTCT AACCTTGGAA 60 0 

CTGTGAGAAC TCTGAGGACA AAGCAGCGGA TACAACCTCA AAAGACGTCT GTCTACATTG 660 

AATTGGGATC TGATTCTTCT GAAGATACCG TTAATAAGGC AACTTATTGC AGTGTGGGAG 720 

ATCAAGAATT GTTACAAATC ACCCCTCAAG GAACCAGGGA TGAAATCAGT TTGGATTCTG 780 

CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA CGGATGTAAC AAATACTGAA CATCATCAAC 84 0 

CCAGTAATAA TGATTTGAAC ACCACTGAGA AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 900 

ATCAGGGTAG TTCTGTTTCA AACTTGCATG TGGAGCCATG TGGCACAAAT ACTCATGCCA 960 

GCTCATTACA GCATGAGAAC AGCAGTTTAT T AC T C AC T AA AGACAGAATG AATGTAGAAA 1020 

AGGCTGAATT C TGT AAT AAA AGCAAACAGC CTGGCTTAGC AAGGAGCCAA CATAACAGAT 10 80 

GGGCTGGAAG TAAGGAAACA TGTAATGATA GGCGGACTCC CAGCACAGAA AAAAAGGTAG 114 0 

ATCTGAATGC TGATCCCCTG TGTGAGAGAA AAGAATGGAA TAAGCAGAAA CTGCCATGCT 12 00 
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CAGAGAATCC T AG AGATAC T GAAGATGTTC CTTGGATAAC ACTAAATAGC AGCATTCAGA 12 60 

AAGTTAATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG TTCTGATGAC TCACATGATG 13 2 Q 

GGGAGTCTGA ATCAAATGCC AAAGTAGCTG ATGTATTGGA CGTTCTAAAT GAGGTAGATG 13 8 0 

AATATTCTGG TTCTTCAGAG AAAATAGACT TACTGGCCAG TGATCCTCAT GAGGCTTTAA 144 0 

TATGTAAAAG TGAAAGAGTT CACTCCAAAT CAGTAGAGAG TAATATTGAA GACAAAATAT 15 0 0 

TTGGGAAAAC C TAT CGGAAG AAGGCAAGCC TCCCCAACTT AAGCCATGTA ACTGAAAATC 15 60 

TAATTATAGG AGCATTTGTT ACTGAGC CAC AGATAATACA AGAGCGTCCC CTCACAAATA 162 0 

AATTAAAGCG TAAAAGGAGA CCTACATCAG GCCTTCATCC TGAGGATTTT ATCAAGAAAG 1680 

CAGATTTGGC AGTTCAAAAG ACTCCTGAAA TGATAAATCA GGGAACTAAC CAAACGGAGC 174 0 

AGAATGGTCA AGTGATGAAT ATT AC TAAT A GTGGTCATGA GAATAAAACA AAAGGTGATT 18 0 0 

CTATTCAGAA TGAGAAAAAT CCTAACCCAA TAGAATCACT CGAAAAAGAA TCTGCTTTCA 18 60 

AAACGAAAGC TGAACCTATA AGCAGCAGTA TAAGCAATAT GGAACTCGAA TTAAATATCC 192 0 

ACAATTCAAA AGCACCTAAA AAGAATAGGC TGAGGAGGAA GTCTTCTACC AGGCATATTC 19 8 0 

ATGCGCTTGA ACTAGTAGTC AGTAGAAATC TAAGCCCACC TAATTGTACT GAATTGCAAA 2040 

TTGATAGTTG TTCTAGCAGT GAAGAGATAA AGAAAAAAAA GTACAACCAA ATGCCAGTCA 2100 

GGCACAGCAG AAACCTACAA CTCATGGAAG GTAAAGAACC TGCAACTGGA GCCAAGAAGA 2160 

GTAACAAGCC AAATGAACAG ACAAGTAAAA GACATGACAG TGATACTTTC CCAGAGCTGA 22 2 0 

AGTTAACAAA TGCACCTGGT TCTTTTACTA AGTGTTCAAA TACCAGTGAA CTTAAAGAAT 22 8 0 

TTGTCAATCC TAGCCTTCCA AGAGAAGAAA AAGAAGAGAA ACTAGAAACA GTTAAAGTGT 23 4 0 

CTAATAATGC TGAAGACCCC AAAGATC TC A TGTTAAGTGG AGAAAGGGTT TTGCAAACTG 24 0 0 
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AAAGATCTGT AGAGAGTAGC AGTATTTCAC TGGTACCTGG TACTGATTAT GGCACTCAGG 2 4 60 

AAAGTATCTC GTTACTGGAA GTTAGCACTC TAGGGAAGGC AAAAACAGAA CCAAATAAAT 2 520 

GTGTGAGTCA GTGTGCAGCA TTTGAAAAC C CCAAGGGACT AATTCATGGT TGTTCCAAAG 2 580 

ATAATAGAAA TGACACAGAA GGCTTTAAGT ATCCATTGGG ACATGAAGTT AACCACAGTC 2 640 

GGGAAACAAG CATAGAAATG GAAGAAAGTG AACTTGATGC TCAGTATTTG CAGAATACAT 2700 

TCAAGGTTTC AAAGCGCCAG TCATTTGCTC TGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 27 60 

AATGTGCAAC ATTCTCTGCC CACTCTGGGT CCTTAAAGAA ACAAAGTCCA AAAGTCACTT 282 0 

TTGAATGTGA ACAAAAGGAA GAAAATCAAG GAAAGAATGA GTCTAATATC AAGCCTGTAC 2 88 0 

AGACAGTTAA TATCACTGCA GGCTTTCCTG TGGTTGGTCA GAAAGATAAG CCAGTTGATA 29 4 0 

ATGCCAAATG T AGT AT C AAA GGAGGCTCTA GGTTTTGTCT ATCATCTCAG TTCAGAGGCA 3 000 

ACGAAACTGG ACTCATTACT CCAAATAAAC ATGGACTTTT ACAAAACCCA TATCGTATAC 3 0 60 

CACCACTTTT TCCCATCAAG TCATTTGTTA AAACTAAATG TAAGAAAAAT CTGCTAGAGG 312 0 

AAAACTTTGA GGAACATTCA ATGTCACCTG AAAGAGAAAT GGGAAATGAG AACATTCCAA 318 0 

GTACAGTGAG CACAATTAGC CGTAATAACA TTAGAGAAAA TGTTTTTAAA GGAGCCAGCT 3 2 40 

CAAGCAATAT TAATGAAGTA GGTTCCAGTA CTAATGAAGT GGGCTCCAGT ATTAATGAAA 3 3 00 

TAGGTTCCAG TGATGAAAAC ATTCAAGCAG AACTAGGTAG AAACAGAGGG CCAAAATTGA 3 3 60 

ATGCTATGCT TAGATTAGGG GTTTTGCAAC CTGAGGTCTA TAAACAAAGT CTTCCTGGAA 3 42 0 

GTAATTGTAA GCATCCTGAA ATAAAAAAGC AAGAATATGA AGAAGTAGTT CAGACTGTTA 3480 

ATACAGATTT CTCTCCATAT CTGATTTCAG ATAACTTAGA ACAGCCTATG GGAAGTAGTC 3 540 

ATGCATCTCA GGTTTGTTCT GAGACACCTG ATGACCTGTT AGATGATGGT GAAATAAAGG 3 600 
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AAGATACTAG TTTTGCTGAA AATGACATTA AGGAAAGTTC TGCTGTTTTT AGCAAAAGCG 3 660 

TCCAGAGAGG AGAGCTTAGC AGGAGTCCTA GCCCTTTCAC C CAT AC AC AT TTGGCTCAGG 3 72 0 

GTTACCGAAG AGGGGCCAAG AAATTAGAGT CCTCAGAAGA GAACTTATCT AGTGAGGATG 3780 

AAGAGCTTCC CTGCTTCCAA CACTTGTTAT TTGGTAAAGT AAACAATATA CCTTCTCAGT 3 84 0 

CTACTAGGCA TAGCACCGTT GCTACCGAGT GTCTGTCTAA GAACACAGAG GAGAATTTAT 3 90 0 

TATCATTGAA GAATAGCTTA AATGACTGCA GTAACCAGGT AATATTGGCA AAGGCATCTC 3 9 60 

AGGAACATCA CCTTAGTGAG GAAACAAAAT GTTCTGCTAG CTTGTTTTCT TCACAGTGCA 4 02 0 

GTGAATTGGA AGACTTGACT GCAAATACAA ACACCCAGGA TCCTTTCTTG ATTGGTTCTT 40 80 

OCCAAACAAAT GAGG CATC AG TCTGAAAGCC AGGGAGTTGG TCTGAGTGAC AAGGAATTGG 4140 

JZ TTTCAGATGA TGAAGAAAGA GGAAC GGGCT TGGAAGAAAA TAATCAAGAA GAGCAAAGCA 42 00 

?|1 TGGATTCAAA CTTAGGTGAA GCAGCATCTG GGTGTGAGAG TGAAACAAGC GTCTCTGAAG 42 60 

MACTGCTCAGG GCTATCCTCT CAGAGTGACA TTTTAACCAC TCAGCAGAGG GATACCATGC 432 0 

L| AACATAACCT GATAAAGCTC CAGCAGGAAA TGGCTGAACT AGAAGCTGTG TTAGAACAGC 43 8 0 

^ATGGGAGCCA GCCTTCTAAC AGCTACCCTT C CAT C AT AAG TGACTCTTCT GCCCTTGAGG 444 0 

ACCTGCGAAA TCCAGAACAA AGCACATCAG AAAAAGCAGT ATTAACTTCA C AG AAAAGT A 45 0 0 

GTGAATACCC TAT AAG C C AG AATCCAGAAG GCCTTTCTGC TGACAAGTTT GAGGTGTCTG 4 5 60 

CAGATAGTTC TACCAGTAAA AATAAAGAAC CAGGAGTGGA AAGGTCATCC CCTTCTAAAT 4 62 0 

GCC'CATCATT AGATGATAGG TGGTACATGC ACAGTTGCTC TGGGAGTCTT CAGAATAGAA 4 68 0 

ACTACCCATC TCAAGAGGAG CTCATTAAGG TTGTTGATGT GGAGGAGCAA CAGCTGGAAG 47 4 0 

AGTCTGGGCC ACACGATTTG ACGGAAACAT CTTACTTGCC AAGG C AAG AT CTAGAGGGAA 43 0 0 
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CCCCTTACCT GGAATCTGGA ATCAGCCTCT TCTCTGATGA CCCTGAATCT GATCCTTCTG 4 8 60 

AAGACAGAGC CCCAGAGTCA GCTCGTGTTG GCAACATACC ATCTTCAACC TCTGCATTGA 4 92 0 

AAGTTCCCCA ATTGAAAGTT GCAGAATCTG CCCAGGGTCC AGCTGCTGCT CATACTACTG 4 9 80 

ATACTGCTGG GTATAATGCA ATGGAAGAAA GTGTGAGCAG GGAGAAGCCA GAATTGACAG 5 040 

CTTCAACAGA AAGGGTCAAC AAAAGAATGT CCATGGTGGT GTCTGGCCTG ACCCCAGAAG 5100 

AATTTATGCT CGTGTACAAG TTTGCCAGAA AACACCACAT CACTTTAACT AATCTAATTA 5160 

CTGAAGAGAC TACTCATGTT GTTATGAAAA CAGATGCTGA GTTTGTGTGT GAACGGACAC 522 0 

TGAAATATTT TCTAGGAATT GCGGGAGGAA AATGGGTAGT TAGCTATTTC TGGGTGACCC 52 80 

3AGTCTATTAA AGAAAGAAAA ATGCTGAATG AGCATGATTT TGAAGTCAGA GGAGATGTGG 534 0 

;::TCAATGGAAG AAACCACCAA GGTCCAAAGC GAGCAAGAGA ATCCCAGGAC AGAAAGATCT 540 0 

^TCAGGGGGCT AGAAATCTGT TGCTATGGGC CCTTCACCAA CATGCCCACA GATCAACTGG 54 60 

-AATGGATGGT ACAGCTGTGT GGTGCTTCTG TGGTGAAGGA GCTTTCATCA TTCACCCTTG 552 0 

jGCACAGGTGT CCACCCAATT GTGGTTGTGC AGCCAGATGC CTGGACAGAG GACAATGGCT 55 8 0 

^TCCATGCAAT TGGGCAGATG TGTGAGGCAC CTGTGGTGAC CCGAGAGTGG GTGTTGGACA 5640 

GTGTAGCACT CTACCAGTGC CAGGAGCTGG ACACCTACCT GATACCCCAG ATCCCCCACA 57 0 0 

GCCACTACTG A 5711 
(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1863 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: not relevant 



78 



(ii) MOLECULE TYPE: protein 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCAl 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME/ SEGMENT : 17 

(B) MAP POSITION: 17q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 ; 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 
15 10 15 

Ala Met Gin Lys lie Leu Glu Cys Pro lie Cys Leu Glu Leu lie Lys 
20 25 30 

Glu Pro Val Ser Thr Lys Cys Asp His lie Phe Cys Lys Phe Cys Met 
35 40 45 

Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 
50 55 60 

Lys Asn Asp lie Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 
65 70 75 80 

Gin Leu Val Glu Glu Leu Leu Lys lie lie Cys Ala Phe Gin Leu Asp 

85 90 95 

Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 
100 105 110 

Asn Ser Pro Glu His Leu Lys Asp Glu Val Ser lie lie Gin Ser Met 
115 120 125 

Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 
130 135 140 
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Pro Ser Leu Gin Glu Thr Ser Leu Ser- Val Gin Leu Ser Asn Leu Gly 
145 150 - 155 160 



Thr Val Arg Thr 

Ser Val Tyr lie 
180 

Lys Ala Thr Tyr 
195 

Pro Gin Gly Thr 
210 

Ala Cys Glu Phe 
225 

Pro Ser Asn Asn 



His Pro Glu Lys 
260 

Pro Cys Gly Thr 
275 

Ser Leu Leu Leu 
290 

Cys Asn Lys Ser 
305 

Trp Ala Gly Ser 



Glu Lys Lys Val 
340 



Leu Arg Thr Lys 
165 

Glu Leu Gly Ser 



Cys Ser Val Gly 
200 

Arg Asp Glu lie 
215 

Ser Glu Thr Asp 
230 

Asp Leu Asn Thr 
245 

Tyr Gin Gly Ser 

Asn Thr His Ala 
280 

Thr Lys Asp Arg 
295 

Lys Gin Pro Gly 
310 

Lys Glu Thr Cys 
325 

Asp Leu Asn Ala 



Gin Arg lie Gin 
170 

Asp Ser Ser Glu 
185 

Asp Gin Glu Leu 



Ser Leu Asp Ser 
220 

Val Thr Asn Thr 
235 

Thr Glu Lys Arg 
250 

Ser Val Ser Asn 
265 

Ser Ser Leu Gin 



Met Asn Val Glu 
300 

Leu Ala Arg Ser 
315 

Asn Asp Arg Arg 
330 

Asp Pro Leu Cys 
345 
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Pro Gin Lys Thr 
175 

Asp Thr Val Asn 
190 

Leu Gin lie Thr 
205 

Ala Lys Lys Ala 



Glu His His Gin 
240 

Ala Ala Glu Arg 
255 

Leu His Val Glu 
270 

His Glu Asn Ser 
285 

Lys Ala Glu Phe 



Gin His Asn Arg 
320 

Thr Pro Ser Thr 
335 

Glu Arg Lys Glu 
350 



Trp Asn Lys Gin Lys Leu Pro Cys Ser Glu Asn Pro Arg Asp Thr Glu 
.355 360 - 3 6 5 

Asp Val Pro Trp He Thr Leu Asn Ser Ser He Gin Lys Val Asn Glu 
370 375 380 

Trp Phe Ser Arg Ser Asp Glu Leu Leu Gly Ser Asp Asp Ser His Asp 
385 390 395 400 

Gly Glu Ser Glu Ser Asn Ala Lys Val Ala Asp Val Leu Asp Val Leu 

405 410 415 

Asn Glu Val Asp Glu Tyr Ser Gly Ser Ser Glu Lys He Asp Leu Leu 
420 425 43Q 

Ala Ser Asp Pro His Glu Ala Leu He Cys Lys Ser Glu Arg Val His 
435 440 445 

Ser Lys Ser Val Glu Ser Asn He Glu Asp Lys He Phe Gly Lys Thr 
450 455 460 

Tyr Arg Lys Lys Ala Ser Leu Pro Asn Leu Ser His Val Thr Glu Asn 
465 470 475 4 80 

Leu He He Gly Ala Phe Val Thr Glu Pro Gin He He Gin Glu Arg 

485 490 495 

Pro Leu Thr Asn Lys Leu Lys Arg Lys Arg Arg Pro Thr Ser Gly Leu 
500 505 510 

His Pro Glu Asp Phe He Lys Lys Ala Asp Leu Ala Val Gin Lys Thr 
515 520 525 

Pro Glu Met He Asn Gin Gly Thr Asn Gin Thr Glu Gin Asn Gly Gin 
530 535 540 

Val Met Asn He Thr Asn Ser Gly His Glu Asn Lys Thr Lys Gly Asp 
545 550 555 560 

Ser He Gin Asn Glu Lys Asn Pro Asn Pro He Glu Ser Leu Glu Lys 
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565 , 570 575 

Glu Ser Ala Phe Lys Thr Lys Ala Glu -Pro He Ser Ser Ser He Ser 

580 585 590 

Asn Met Glu Leu Glu Leu Asn He His Asn Ser Lys Ala Pro Lys Lys 

595 600 605 

Asn Arg Leu Arg Arg Lys Ser Ser Thr Arg His He His Ala Leu Glu 
610 615 620 

Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 630 635 640 

He Asp Ser Cys Ser Ser Ser Glu Glu He Lys Lys Lys Lys Tyr Asn 

645 650 655 

Gin Met Pro Val Arg His Ser Arg Asn Leu Gin Leu Met Glu Gly Lys 
660 665 670 

Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 
675 680 685 

Ser Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 
690 695 700 

Ala Pro Gly Ser Phe Thr Lys Cys Ser Asn Thr Ser Glu Leu Lys Glu 
705 710 715 720 

Phe Val Asn Pro Ser Leu Pro Arg Glu Glu Lys Glu Glu Lys Leu Glu 

725 730 735 

Thr Val Lys Val Ser Asn Asn Ala Glu Asp Pro Lys Asp Leu Met Leu 
740 745 750 

Ser Gly Glu Arg Val Leu Gin Thr Glu Arg Ser Val Glu Ser Ser Ser 
755 760 765 

lie Ser Leu Val Pro Gly Thr Asp Tyr Gly Thr Gin Glu Ser lie Ser 
770 775 780 
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Leu Leu Glu Val Ser 
785 

Cys Val Ser Gin Cys 

805 

Gly Cys Ser Lys Asp 
820 

Leu Gly His Glu Val 
835 

Glu Ser Glu Leu Asp 
850 

Lys Arg Gin Ser Phe 
865 

Glu Cys Ala Thr Phe 

885 

Pro Lys Val Thr Phe 
900 

Asn Glu Ser Asn lie 
915 

Phe Pro Val Val Gly 
930 

Ser lie Lys Gly Gly 
945 

Asn Glu Thr Gly Leu 

965 

Pro Tyr Arg lie Pro 
980 

Lys Cys Lys Lys Asn 
995 



Thr Leu Gly Lys Ala Lys 
790 - 795 

Ala Ala Phe Glu Asn Pro 

810 

Asn Arg Asn Asp Thr Glu 
825 

Asn His Ser Arg Glu Thr 
840 

Ala Gin Tyr Leu Gin Asn 
855 

Ala Leu Phe Ser Asn Pro 
870 875 

Ser Ala His Ser Gly Ser 
890 

Glu Cys Glu Gin Lys Glu 
905 

Lys Pro Val Gin Thr Val 
920 

Gin Lys Asp Lys Pro Val 
935 

Ser Arg Phe Cys Leu Ser 
950 955 

lie Thr Pro Asn Lys His 
970 

Pro Leu Phe Pro lie Lys 
985 

Leu Leu Glu Glu Asn Phe 
1000 



Thr Glu Pro Asn Lys 

800 

Lys Gly Leu lie His 
815 

Gly Phe Lys Tyr Pro 
830 

Ser lie Glu Met Glu 
845 

Thr Phe Lys Val Ser 
860 

Gly Asn Ala Glu Glu 

880 

Leu Lys Lys Gin Ser 
895 

Glu Asn Gin Gly Lys 
910 

Asn lie Thr Ala Gly 
925 

Asp Asn Ala Lys Cys 
940 

Ser Gin Phe Arg Gly 

960 

Gly Leu Leu Gin Asn 
975 

Ser Phe Val Lys Thr 
990 

Glu Glu His Ser Met 
1005 
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Ser Pro ^ Glu Arg Glu Met Gly Asn Glu- Asn lie Pro Ser Thr Val Ser 
1010 1015 - 1020 



Thr lie Ser Arg Asn Asn lie Arg Glu Asn Val Phe Lys Gly Ala Ser 
1025 1030 1035 1040 

Ser Ser Asn lie Asn Glu Val Gly Ser Ser Thr Asn Glu Val Gly Ser 
1045 1050 1055 

Ser lie Asn Glu lie Gly Ser Ser Asp Glu Asn He Gin Ala Glu Leu 
1060 1065 1070 

Gly Arg Asn Arg Gly Pro Lys Leu Asn Ala Met Leu Arg Leu Gly Val 
1075 1080 1085 

Leu Gin Pro Glu Val Tyr Lys Gin Ser Leu Pro Gly Ser Asn Cys Lys 
1090 1095 1100 

His Pro Glu He Lys Lys Gin Glu Tyr Glu Glu Val Val Gin Thr Val 
1105 1110 1115 1120 

Asn Thr Asp Phe Ser Pro Tyr Leu He Ser Asp Asn Leu Glu Gin Pro 
1125 1130 1135 

Met Gly Ser Ser His Ala Ser Gin Val Cys Ser Glu Thr Pro Asp Asp 
1140 1145 1150 

Leu Leu Asp Asp Gly Glu He Lys Glu Asp Thr Ser Phe Ala Glu Asn 
1155 1160 1165 

Asp He Lys Glu Ser Ser Ala Val Phe Ser Lys Ser Val Gin Arg Gly 
1170 1175 1180 

Glu Leu Ser Arg Ser Pro Ser Pro Phe Thr His Thr His Leu Ala Gin 
1185 1190 1195 1200 

Gly Tyr Arg Arg Gly Ala Lys Lys Leu Glu Ser Ser Glu Glu Asn Leu 
1205 1210 1215 

Ser Ser Glu Asp Glu Glu Leu Pro Cys Phe Gin His Leu Leu Phe Gly 
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1220 1225 1230 

Lys Val Asn Asn lie Pro Ser Gin Ser Thr Arg His Ser Thr Val Ala 
1235 1240 1245 

Thr Glu Cys Leu Ser Lys Asn Thr Glu Glu Asn Leu Leu Ser Leu Lys 
1250 1255 1260 

Asn Ser Leu Asn Asp Cys Ser Asn Gin Val lie Leu Ala Lys Ala Ser 
1265 1270 1275 1280 

Gin Glu His His Leu Ser Glu Glu Thr Lys Cys Ser Ala Ser Leu Phe 

1285 1290 1295 

Ser Ser Gin Cys Ser Glu Leu Glu Asp Leu Thr Ala Asn Thr Asn Thr 
1300 1305 1310 

Gin Asp Pro Phe Leu lie Gly Ser Ser Lys Gin Met Arg His Gin Ser 
1315 1320 1325 

Glu Ser Gin Gly Val Gly Leu Ser Asp Lys Glu Leu Val Ser Asp Asp 
1330 1335 1340 

Glu Glu Arg Gly Thr Gly Leu Glu Glu Asn Asn Gin Glu Glu Gin Ser 
1345 1350 1355 1360 

Met Asp Ser Asn Leu Gly Glu Ala Ala Ser Gly Cys Glu Ser Glu Thr 

1365 1370 1375 

Ser Val Ser Glu Asp Cys Ser Gly Leu Ser Ser Gin Ser Asp lie Leu 
1380 1385 1390 

Thr Thr Gin Gin Arg Asp Thr Met Gin His Asn Leu lie Lys Leu Gin 
1395 1400 1405 

Gin Glu Met Ala Glu Leu Glu Ala Val Leu Glu Gin Kis Gly Ser Gin 
1410 1415 1420 

Pro Ser Asn Ser Tyr Pro Ser lie lie Ser Asp Ser Ser Ala Leu Glu 
1425 1430 1435 1440 
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Asp Leu Arg Asn Pro Glu Gin Ser Thr- Ser Glu Lys Ala Val Leu Thr 

1445 -1450 1455 



Ser Gin Lys Ser Ser Glu Tyr Pro lie Ser Gin Asn Pro Glu Gly Leu 
1460 1465 1470 

Ser Ala Asp Lys Phe Glu Val Ser Ala Asp Ser Ser Thr Ser Lys Asn 
1475 1430 1485 

Lys Glu Pro Gly Val Glu Arg Ser Ser Pro Ser Lys Cys Pro Ser Leu 

1490 1495 1500 

Asp Asp Arg Trp Tyr Met His Ser Cys Ser Gly Ser Leu Gin Asn Arg 
1505 1510 1515 1520 

Asn Tyr Pro Ser Gin Glu Glu Leu lie Lys Val Val Asp Val Glu Glu 

1525 1530 1535 

Gin Gin Leu Glu Glu Ser Gly Pro His Asp Leu Thr Glu Thr Ser Tyr 
1540 1545 1550 

Leu Pro Arg Gin Asp Leu Glu Gly Thr Pro Tyr Leu Glu Ser Gly lie 
1555 1560 1565 

Ser Leu Phe Ser Asp Asp Pro Glu Ser Asp Pro Ser Glu Asp Arg Ala 
1570 1575 1580 

Pro Glu Ser Ala Arg Val Gly Asn lie Pro Ser Ser Thr Ser Ala Leu 
1585 1590 1595 1600 

Lys Val Pro Gin Leu Lys Val Ala Glu Ser Ala Gin Gly Pro Ala Ala 

1605 1610 1615 

Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Glu Ser Val 
1620 1625 1630 

Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu Arg Val Asn Lys 
1635 1640 1645 
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Arg Met Ser Met Val Val Ser Gly Leu Thr Pro Glu Glu Phe Met Leu 
1650 1655 , 1660 



Val Tyr Lys Phe Ala Arg Lys His His lie Thr Leu Thr Asn Leu lie 
1665 1670 1675 1680 

Thr Glu Glu Thr Thr His Val Val Met Lys Thr Asp Ala Glu Phe Val 

1685 1690 1695 

Cys Glu Arg Thr Leu Lys Tyr Phe Leu Gly lie Ala Gly Gly Lys Trp 
1700 1705 1710 • 

Val Val Ser Tyr Phe Trp Val Thr Gin Ser lie Lys Glu Arg Lys Met 
1715 1720 1725 

Leu Asn Glu His Asp Phe Glu Val Arg Gly Asp Val Val Asn Gly Arg 
1730 1735 1740 

Asn His Gin Gly Pro Lys Arg Ala Arg Glu Ser Gin Asp Arg Lys lie 
1745 1750 1755 1760 

Phe Arg Gly Leu Glu lie Cys Cys Tyr Gly Pro Phe Thr Asn Met Pro 

1765 1770 1775 

Thr Asp Gin Leu Glu Trp Met Val Gin Leu Cys Gly Ala Ser Val Val 
1780 1785 1790 

Lys Glu Leu Ser Ser Phe Thr Leu Gly Thr Gly Val His Pro lie Val 

1795 1800 1805 

Val Val Gin Pro Asp Ala Trp Thr Glu Asp Asn Gly Phe His Ala lie 
1810 1815 1820 

Gly Gin Met Cys Glu Ala Pro Val Val Thr Arg Glu Trp Val Leu Asp 
1825 1830 1835 1840 

Ser Val Ala Leu Tyr Gin Cys Gin Glu Leu Asp Thr Tyr Leu lie Pro 

1845 1850 1355 

Gin lie Pro His Ser His Tyr 
1860 
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(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 2F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GAAGTTGTCA TTTTATAAAC CTTT 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 2R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TGTCTTTTCT TCCCTAGTAT GT 
(2) INFORMATION FOR SEQ ID NO : 9 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 3F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

TCCTGACACA GCAGACATTT A 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 3R primer 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO 
TTGGATTTTC GTTCTCACTT A 
(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 5F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CTCTTAAGGG CAGTTGTGAG 

4 (2) INFORMATION FOR SEQ ID NO: 12: 

f\ (i) SEQUENCE CHARACTERISTICS: 

J (A) LENGTH: 20 base pairs 

y (B) TYPE: nucleic acid 

: £ : (C) STRANDEDNESS: not relevant 

|j (D) TOPOLOGY: linear 

~ (ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

<B) STRAIN: 5R-M13* primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TTCCTACTGT GGTTGCTTCC 

(2) INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 6/7F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 
CTTATTTTAG TGTCCTTAAA AGG 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 
(B) STRAIN: 6R 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 
TTTCATGGAC AGCACTTGAG TG 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

<B) STRAIN: 7F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CACAACAAAG AGCATACATA GGG 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 6/7R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

TCGGGTTCAC TCTGTAGAAG 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

( vi ) ORIGINAL SOURCE : 

(B) STRAIN: 8F1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

TTCTCTTCAG GAGGAAAAGC A 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 8R1 primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GCTGCCTACC ACAAATACAA A 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



93 



(vi) 



ORIGINAL SOURCE: 

(B) STRAIN: 9F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CCACAGTAGA TGCTCAGTAA ATA 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 9R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TAGGAAAATA CCAGCTTCAT AGA 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(vi) ORIGINAL SOURCE: 

(B) STRAIN: 10F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TGGTCAGCTT TCTGTAATCG 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

( D ) TO POLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Vi) ORIGINAL SOURCE: 

(B) STRAIN: 10R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GTATCTACCC ACTCTCTTCT TCAG 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vi) ORIGINAL SOURCE: 
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(B) STRAIN: 11AF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CCACCTCCAA GGTGTATCA 

(2) INFORMATION FOR SEQ ID NO: 24: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<vi) ORIGINAL SOURCE: 

<B) STRAIN: 11AR primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TGTTATGTTG GCTCCTTGCT 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11BF1 primer 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:-25 



CACTAAAGAC AGAATGAATC TA 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11BR1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 

GAAGAACCAG AAT ATT CATC TA 

(2) INFORMATION FOR SEQ ID NO: 27: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11CF1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
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TGATGGGGAG TCTGAATCAA 



20 



(2) INFORMATION FOR SEQ ID NO; 28: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11CR1 primer 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
TCTGCTTTCT TGATAAAATC CT 22 
(2} INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11DF1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
AGCGTCCCCT CACAAATAAA 2 0 
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(2) INFORMATION FOR SEQ ID NO: 30: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11DR1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
3 TCAAGCGCAT GAATATGCCT 
Z (2) INFORMATION FOR SEQ ID NO: 31: 

?J (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 
* (B) TYPE: nucleic acid 

y (C) STRANDEDNESS : not relevant 

:] (D) TOPOLOGY: linear 

3 (ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11EF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GTATAAGCAA TATGGAACTC GA 
(2) INFORMATION FOR SEQ ID NO: 32: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: HER primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
TTAAGTTCACT GGTATTTGAA CA 

(2) INFORMATION FOR SEQ ID NO: 33: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11FF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

GACAGCGATA CTTTCCCAGA 

(2) INFORMATION FOR SEQ ID NO:34 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11FR primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TGGAACAACC ATGAATTAGT C 
(2) INFORMATION FOR SEQ ID NO : 3 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11GF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GGAAGTTAGC ACTCTAGGGA 

(2) INFORMATION FOR SEQ ID NO : 3 6 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE : 

<B) STRAIN: 11GR primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GCAGTGATAT TAACTGTCTG TA 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11HF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TGGGTCCTTA AAGAAACAAA GT 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11HR primer 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N( 
TCAGGTGACA TTGAATCTTC C 

(2) INFORMATION FOR SEQ ID NO: 39: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE : 

(B) STRAIN: 11IF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
CCACTTTTTC CCATCAAGTC A 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vi) ORIGINAL SOURCE: 
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<B) STRAIN: 11IR primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TCAGGATGCT TACAATTACT TC 
(2) INFORMATION FOR SEQ ID NO: 41: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11JF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

CAAAATTGAA TGCTATGCTT AGA 

(2) INFORMATION FOR SEQ ID NO: 42: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11JR primer 
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(xi) SEQUENCE DESCRIPTION: SEQ -ID NO 
TCGGTAACCC TGAGCCAAAT 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11KF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GCAAAAGCGT CCAGAAAGGA 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11KR-1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
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TATTTGCAGT CAAGTCTTCC AA 

(2) INFORMATION FOR SEQ ID NO: 45: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11LF-1 primer 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GTAATATTGG CAAAGGCATC T 
(2) INFORMATION FOR SEQ ID NO: 46: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( vi ) ORIGINAL SOURCE : 

(B) STRAIN: 11LR primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TAAAATGTGC TCCCCAAAAG CA 
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(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS ; - 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 12F primer 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO 

GTCCTGCCAA TGAGAAGAAA 

(2) INFORMATION FOR SEQ ID NO: 48: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 12R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TGTCAGCAAA CCTAAGAATG T 
(2) INFORMATION FOR SEQ ID NO: 49: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 13 F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
AATGGAAAGC TTCTCAAAGT A 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 13R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 

ATGTTGGAGC TAGGTCCTTA C 

(2) INFORMATION FOR SEQ ID NO: 51: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 
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(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 14F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CTAACCTGAA TTATCACTAT CA 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vi) ORIGINAL SOURCE: 
(B) STRAIN : 14R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GTGTATAAAT GCCTGTATGC A 
INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 15F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TGGCTGCCCA GGAAGTATG 

(2) INFORMATION FOR SEQ ID NO: 54: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 15R primer 



5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

AACCAGAATA TCTTTATGTA GGA 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(vi) ORIGINAL SOURCE: 

(B) STRAIN: 16F primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
AATTCTTAAC AG AG AC C AGA AC 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 16R primer 



4 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

ifAAACTCTTT CCAGAATGTT GT 

(2) INFORMATION FOR SEQ ID NO: 57: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 17F primer 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GTGTAGAACG TGCAGGATTG 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 17R primer 



jj (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

; HTCGCCTCATG TGGTTTTA 

; J 2 ) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 18F primer 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

GGCTCTTTAG CTTCTTAGGA C 

(2) INFORMATION FOR SEQ ID NO: 60: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 18R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 



gIagaccattt TCCCAGCATC 

Ji,2) INFORMATION FOR SEQ ID NO: 61: 

^ (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
O (B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 19F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CTGTCATTCT TCCTGTGCTC 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GTGTAGAACG TGCAGGATTG 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

^ (vi) ORIGINAL SOURCE: 

(B) STRAIN: 17R primer 



\J (xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TCGCCTCATG TGGTTTTA 

'<§) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 18F primer 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
{ B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 2 OR primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

.^QGGAATCCAA AT TAC AC AGC 

C2) INFORMATION FOR SEQ ID NO: 65: 

■■I (i) SEQUENCE CHARACTERISTICS: 

, : J (A) LENGTH: 22 base pairs 

\ : " (B) TYPE: nucleic acid 

— (C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

-■j (ii) MOLECULE TYPE: DNA (genomic) 

( vi ) ORIGINAL SOURCE : 

(B) STRAIN: 2 IF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

AAGCTCTTCC TTTTTGAAAG TC 

(2) INFORMATION FOR SEQ ID NO: 66: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: not relevant 

(D) ^ TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 21R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

GTAGAGAAAT AGAATAGCCT CT 

(2) INFORMATION FOR SEQ ID NO:67: 

2 (i) SEQUENCE CHARACTERISTICS : 

: j (A) LENGTH: 20 base pairs 

. .;; (B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

, I (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

; : (vi) ORIGINAL SOURCE: 
r 3 (B) STRAIN: 22F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

TCCCATTGAG AGGTCTTGCT 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 22R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

GAGAAGACTT CTGAGGCTAC 

(2) INFORMATION FOR SEQ ID NO: 69: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 23F-1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TGAAGTGACA GTTC CAGTAG T 
<2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(vi) ORIGINAL SOURCE: 

(B) STRAIN: 23R-1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
CATTTTAGCC ATTCATTCAA CAA 
(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 
ri (D) TOPOLOGY: linear 

s j (id) MOLECULE TYPE: DNA (genomic) 

fn (vi) ORIGINAL SOURCE: 

^ (B) STRAIN: 24F primer 



H (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 

ATGAATTGAC ACTAATCTCT GC 

(2) INFORMATION FOR SEQ ID NO: 72: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 2 4R primer 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
GTAGCCAGGA CAGTAGAAGG A 
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